| Literature DB >> 35846287 |
Bo Pang1,2, Qiong Wang1,2, Min Yang1,2, Mei Xue1,2, Yicheng Zhang1,2, Xiangling Deng1,2, Zhixin Zhang3,2, Wenquan Niu4.
Abstract
Background andEntities:
Keywords: machine learning and deep learning; precocious puberty; prediction performance; school children; top factor
Mesh:
Year: 2022 PMID: 35846287 PMCID: PMC9279618 DOI: 10.3389/fendo.2022.892005
Source DB: PubMed Journal: Front Endocrinol (Lausanne) ISSN: 1664-2392 Impact factor: 6.055
The baseline characteristics of school girls stratified by the presence of precocious puberty.
| Survey factors | Absence of precocious puberty | Presence of precocious puberty | P | |
|---|---|---|---|---|
| (n = 5119) | (n = 408) | |||
| Age (months) | 130.0 [106.0, 157.0] | 131.0 [111.0, 152.00] | 0.573 | |
| Ethnicity (%) | Han | 4888 (95.5) | 393 (96.3) | 0.892 |
| Man | 143 (2.8) | 9 (2.2) | ||
| Hui | 12 (0.2) | 0 (0.0) | ||
| Others | 76 (1.5) | 6 (1.5) | ||
| WHtR | 0.43 [0.39, 0.48] | 0.45 [0.40, 0.50] | <0.001 | |
| BMI | 18.07 [15.61, 21.23] | 20.05 [17.79, 23.29] | <0.001 | |
| Fiber-rich foods (%) | None or occasionally | 90 (1.8%) | 11 (2.7%) | 0.095 |
| 1-2 times weekly | 793 (15.5%) | 77 (18.9%) | ||
| 3-5 times weekly | 1446 (28.2%) | 117 (28.7%) | ||
| Every day | 2790 (54.5%) | 203 (49.8%) | ||
| Out-of-season foods (%) | None or occasionally | 688 (13.4%) | 53 (13.0%) | 0.041 |
| 1-2 times weekly | 1845 (36.0%) | 174 (42.6%) | ||
| 3-5 times weekly | 1399 (27.3%) | 105 (25.7%) | ||
| Every day | 1187 (23.2%) | 76 (18.6%) | ||
| Animal protein foods (%) | None or occasionally | 60 (1.2%) | 7 (1.7%) | 0.565 |
| 1-2 times weekly | 834 (16.3%) | 62 (15.2%) | ||
| 3-5 times weekly | 1643 (32.1%) | 139 (34.1%) | ||
| Every day | 2582 (50.4%) | 200 (49.0%) | ||
| Plant protein foods (%) | None or occasionally | 424 (8.3%) | 38 (9.3%) | 0.357 |
| 1-2 times weekly | 2038 (39.8%) | 175 (42.9%) | ||
| 3-5 times weekly | 1436 (28.1%) | 111 (27.2%) | ||
| Every day | 1221 (23.9%) | 84 (20.6%) | ||
| Milk products (%) | None or occasionally | 179 (3.5%) | 6 (1.5%) | 0.112 |
| 1-2 times weekly | 755 (14.7%) | 55 (13.5%) | ||
| 3-5 times weekly | 1314 (25.7%) | 108 (26.5%) | ||
| Every day | 2871 (56.1%) | 239 (58.6%) | ||
| Tonic foods (%) | None or occasionally | 4218 (82.4%) | 352 (86.3%) | 0.077 |
| 1-2 times weekly | 517 (10.1%) | 37 (9.1%) | ||
| 3-5 times weekly | 181 (3.5%) | 12 (2.9%) | ||
| Every day | 203 (4.0%) | 7 (1.7%) | ||
| Food with preservatives (%) | None or occasionally | 2831 (55.3%) | 219 (53.7%) | 0.094 |
| 1-2 times weekly | 1730 (33.8%) | 153 (37.5%) | ||
| 3-5 times weekly | 366 (7.1%) | 29 (7.1%) | ||
| Every day | 192 (3.8%) | 7 (1.7%) | ||
| Fast foods (%) | None or occasionally | 2357 (46.0%) | 187 (45.8%) | 0.335 |
| 1-2 times weekly | 2416 (47.2%) | 195 (47.8%) | ||
| 3-5 times weekly | 233 (4.6%) | 22 (5.4%) | ||
| Every day | 113 (2.2%) | 4 (1.0%) | ||
| Snacks (%) | None or occasionally | 909 (17.8%) | 58 (14.2%) | 0.087 |
| 1-2 times weekly | 2907 (56.8%) | 244 (59.8%) | ||
| 3-5 times weekly | 914 (17.9%) | 83 (20.3%) | ||
| Every day | 389 (7.6%) | 23 (5.6%) | ||
| Sweet foods (%) | None or occasionally | 885 (17.3%) | 57 (14.0%) | 0.126 |
| 1-2 times weekly | 3001 (58.6%) | 244 (59.8%) | ||
| 3-5 times weekly | 932 (18.2%) | 88 (21.6%) | ||
| Every day | 301 (5.9%) | 19 (4.7%) | ||
| Eating speed (minutes) | 16.67 [13.33, 20.00] | 16.67 [13.33, 20.00] | 0.063 | |
| Night meal (%) | None or occasionally | 2653 (51.8) | 219 (53.7) | 0.02 |
| 1-2 times weekly | 1482 (29.0) | 135 (33.1) | ||
| 3-5 times weekly | 533 (10.4) | 29 (7.1) | ||
| Every day | 451 (8.8) | 25 (6.1) | ||
| Sleep with lights on (%) | None or occasionally | 4426 (86.5) | 355 (87.0) | 0.99 |
| 1-2 times weekly | 298 (5.8) | 24 (5.9) | ||
| 3-5 times weekly | 126 (2.5) | 9 (2.2) | ||
| Every day | 269 (5.3) | 20 (4.9) | ||
| Monophagia (%) | None or occasionally | 2552 (49.9%) | 239 (58.6%) | 0.005 |
| 1-2 times weekly | 1534 (30.0%) | 110 (27.0%) | ||
| 3-5 times weekly | 561 (11.0%) | 32 (7.8%) | ||
| Every day | 472 (9.2%) | 27 (6.6%) | ||
| Use of plastic tableware (%) | None or occasionally | 3328 (65.0%) | 264 (64.7%) | 0.657 |
| 1-2 times weekly | 1036 (20.2%) | 88 (21.6%) | ||
| 3-5 times weekly | 311 (6.1%) | 27 (6.6%) | ||
| Every day | 444 (8.7%) | 29 (7.1%) | ||
| Cosmetics exposure (%) | None or occasionally | 4650 (90.8%) | 368 (90.2%) | 0.038 |
| 1-2 times weekly | 301 (5.9%) | 18 (4.4%) | ||
| 3-5 times weekly | 65 (1.3%) | 12 (2.9%) | ||
| Every day | 103 (2.0%) | 10 (2.5%) | ||
| Physical activity (hours per day) | 1.29 [1.00, 1.57] | 1.00 [0.86, 1.57] | 0.002 | |
| Sitting duration (hours per day) | 5.86 [3.43, 7.43] | 6.29 [4.14, 7.43] | 0.032 | |
| Screen time (hours per day) | 1.29 [0.64, 1.57] | 1.29 [0.89, 2.00] | <0.001 | |
| Sleep duration (hours per day) | 9.00 [8.29, 9.29] | 8.71 [8.29, 9.29] | 0.067 | |
| Fall asleep time (hours per day) | 10.00 [9.00, 10.00] | 10.00 [9.50, 10.00] | 0.008 | |
| Pregnancy order (%) | 1 | 3404 (66.8%) | 269 (66.1%) | 0.437 |
| 2 | 1260 (24.7%) | 96 (23.6%) | ||
| 3 | 340 (6.7%) | 30 (7.4%) | ||
| 4 | 72 (1.4%) | 10 (2.5%) | ||
| 5 | 21 (0.4%) | 2 (0.5%) | ||
| Delivery order (%) | 1 | 4303 (84.4%) | 363 (89.0%) | 0.096 |
| 2 | 710 (13.9%) | 41 (10.0%) | ||
| 3 | 67 (1.3%) | 3 (0.7%) | ||
| 4 | 17 (0.3%) | 1 (0.2%) | ||
| Delivery mode (%) | Vaginal delivery | 2558 (50.0%) | 190 (46.6%) | 0.198 |
| Cesarean section | 2561 (50.0%) | 218 (53.4%) | ||
| Assisted reproductive technology (%) | Unused | 5043 (98.5%) | 401 (98.3%) | 0.671 |
| Used | 76 (1.5%) | 7 (1.7%) | ||
| Gestational week | 39.00 [38.00, 40.00] | 39.00 [38.00, 40.00] | 0.006 | |
| Birth weight (kg) | 3.30 [3.00, 3.60] | 3.30 [3.00, 3.50] | 0.038 | |
| Birth body length (cm) | 50.00 [50.00, 52.00] | 50.00 [50.00, 52.00] | 0.922 | |
| Bearing age of father | 27.58 [25.67, 30.08] | 27.33 [25.75, 29.50] | 0.529 | |
| Bearing age of mother | 26.58 [24.50, 28.92] | 26.33 [24.33, 28.33] | 0.218 | |
| Infancy feeding (%) | Breastfeeding | 3027 (59.1%) | 226 (55.4%) | 0.217 |
| Mixed feeding | 1523 (29.8%) | 127 (31.1%) | ||
| Artificial feeding | 569 (11.1%) | 55 (13.5%) | ||
| Breastfeeding duration (months) | 8.00 [0.00, 13.00] | 7.00 [0.00, 12.00] | 0.123 | |
| Time to add complementary (months) | 6.00 [6.00, 7.00] | 6.00 [6.00, 6.00] | 0.008 | |
| Maternal age at menarche | 13.00 [12.00, 14.00] | 13.00 [12.00, 14.00] | <0.001 | |
| Paternal BMI | 25.43 [23.39, 27.76] | 25.95 [23.75, 28.49] | 0.005 | |
| Maternal BMI | 22.92 [20.82, 25.39] | 23.90 [21.48, 26.44] | <0.001 | |
| Number of relatives with hypertension (%) | 0 | 2420 (47.3%) | 149 (36.5%) | <0.001 |
| 1 | 1186 (23.2%) | 120 (29.4%) | ||
| 2 | 939 (18.3%) | 88 (21.6%) | ||
| 3 | 412 (8.0%) | 34 (8.3%) | ||
| 4 | 162 (3.2%) | 17 (4.2%) | ||
| Number of relatives with diabetes (%) | 0 | 3471 (67.8%) | 266 (65.2%) | 0.131 |
| 1 | 1216 (23.8%) | 96 (23.5%) | ||
| 2 | 348 (6.8%) | 40 (9.8%) | ||
| 3 | 63 (1.2%) | 3 (0.7%) | ||
| 4 | 21 (0.4%) | 3 (0.7%) | ||
| Paternal education (%) | Junior high school degree or below | 803 (15.7%) | 54 (13.2%) | 0.167 |
| High school degree | 1857 (36.3%) | 139 (34.1%) | ||
| Bachelor’s degree or above | 2459 (48.0%) | 215 (52.7%) | ||
| Maternal education (%) | Middle school degree or below | 806 (15.7%) | 48 (11.8%) | 0.074 |
| High school degree | 1490 (29.1%) | 118 (28.9%) | ||
| Bachelor’s degree or above | 2823 (55.1%) | 242 (59.3%) | ||
| Household income (RMB per year) | <100,000 | 2420 (47.3%) | 162 (39.7%) | 0.003 |
| 100,000-300,000 | 2299 (44.9%) | 200 (49.0%) | ||
| >300,000 | 400 (7.8%) | 46 (11.3%) | ||
Continuous data are expressed as mean (standard deviation) in normal distributions and median [interquartile range] in skewed distributions. Categorical data are expressed as count (percentage). For continuous data, P for comparison between girls with non-precocious puberty or precocious puberty was derived by t test for normally distributed data, by rank-sum test for skewed data, and by χ2 test for categorical data. WHtR, waist-to-height ratio; BMI, body mass index.
Figure 1The prediction accuracy of 13 machine learning algorithms, along with hard and soft voting classifiers.
The prediction performance of 13 machine learning algorithms from accuracy, precision, recall, F1 score and AUROC aspects for precocious puberty.
| Algorithms | Accuracy | Precision | Recall | F1 score | AUROC |
|---|---|---|---|---|---|
| Logistic regression | 0.9534 | 1.0000 | 0.3439 | 0.4118 | 0.7799 |
| Decision tree | 0.8892 | 0.2864 | 0.3758 | 0.3251 | 0.6521 |
| Adaboost decision tree | 0.9507 | 0.9286 | 0.3312 | 0.4883 | 0.7724 |
| Support vector machine | 0.9534 | 1.0000 | 0.3439 | 0.5118 | 0.7576 |
| Random forest | 0.9534 | 1.0000 | 0.3439 | 0.5118 | 0.7513 |
| K-nearest neighbor | 0.9290 | 0.0000 | 0.0000 | 0.0000 | 0.5743 |
| Gradient boosting machine | 0.9521 | 0.9322 | 0.3503 | 0.5093 | 0.7838 |
| Extreme gradient boosting | 0.9539 | 1.0000 | 0.3503 | 0.5189 | 0.6752 |
| Light gradient boosting machine | 0.9539 | 1.0000 | 0.3503 | 0.5189 | 0.7822 |
| Multi-layer perceptron | 0.9290 | 0.0000 | 0.0000 | 0.0000 | 0.5057 |
| Gaussian naive Bayes | 0.9398 | 0.6395 | 0.3503 | 0.4527 | 0.7658 |
| Multinomial naive Bayes | 0.9534 | 1.0000 | 0.3439 | 0.5118 | 0.7202 |
| Bernoulli naive Bayes | 0.9534 | 1.0000 | 0.3439 | 0.5118 | 0.6960 |
AUROC, area under the receiver operating characteristic curve.
Figure 2The area under the receiver operating characteristic curve (AUROC) of gradient boosting machine algorithm for the prediction of precocious puberty. AUC, area under the receiver operating characteristic curve; ROC, receiver operating characteristic curve.
Figure 3Top 20 factors for predicting precocious puberty in a descending order of importance. BMI, body mass index.
The areas under the receiver operating curve (AUROC), accuracy and precision with the cumulating number of top ten factors in an ascending order.
| Number of top ten factors in rank | AUROC | Accuracy | Precision |
|---|---|---|---|
| 1 | 0.6720 | 0.9534 | 1.0000 |
| 2 | 0.6862 | 0.9534 | 1.0000 |
| 3 | 0.7137 | 0.9534 | 1.0000 |
| 4 | 0.7202 | 0.9534 | 1.0000 |
| 5 | 0.7457 | 0.9525 | 0.9642 |
| 6 | 0.7861 | 0.9530 | 0.9818 |
| 7 | 0.7863 | 0.9530 | 0.9818 |
| 8 | 0.7852 | 0.9525 | 0.9333 |
| 9 | 0.7806 | 0.9520 | 0.9473 |
| 10 | 0.7722 | 0.9516 | 0.9310 |
Model loss and accuracy of deep learning sequential model in both training and testing groups.
| Optimizers | Training group | Testing group | ||
|---|---|---|---|---|
| Loss | Accuracy | Loss | Accuracy | |
| All factors | ||||
| Adam | 11.68% | 96.32% | 31.97% | 93.76% |
| RMSprop | 15.30% | 95.52% | 26.05% | 94.12% |
| SGD | 15.86% | 95.75% | 23.69% | 93.80% |
| Top 6 factors | ||||
| Adam | 26.31% | 92.57% | 25.71% | 92.90% |
| RMSprop | 24.06% | 93.51% | 25.66% | 92.90% |
| SGD | 25.20% | 93.07% | 25.63% | 92.90% |
Adam, adaptive moment estimation; RMSprop, root mean square prop; SGD, stochastic gradient descent.
The risk prediction of top 6 variables for precocious puberty using the Logistic regression model.
| Top 6 factors | OR (95% CI) | P |
|---|---|---|
| Maternal age at menarche | 0.74 (0.69, 0,80) | <0.001 |
| Paternal BMI | 1.04 (1.02, 1.07) | 0.003 |
| Waist-to-height ratio | 1.31 (1.15, 1.49) | <0.001 |
| Maternal BMI | 1.06 (1.03, 1.10) | <0.001 |
| Screen time | 1.12 (1.04, 1.22) | 0.003 |
| Physical activity | 0.78 (0.66, 0.93) | 0.006 |
OR, odds ratio; 95% CI, 95% confidence interval.