| Literature DB >> 35783299 |
Mei Xue1,2, Qiong Wang1,2, Yicheng Zhang1,2, Bo Pang1,2, Min Yang1,2, Xiangling Deng1,2, Zhixin Zhang2,3, Wenquan Niu4.
Abstract
Aims: We employed machine-learning methods to explore data from a large survey on students, with the goal of identifying and validating a thrifty panel of important factors associated with lower respiratory tract infection (LRTI).Entities:
Keywords: deep learning; factor; lower respiratory tract infection; machine learning; performance
Year: 2022 PMID: 35783299 PMCID: PMC9243225 DOI: 10.3389/fped.2022.911591
Source DB: PubMed Journal: Front Pediatr ISSN: 2296-2360 Impact factor: 3.569
The baseline characteristics of participating students according to lower respiratory tract infection.
|
|
|
|
|
|---|---|---|---|
|
|
| ||
|
| |||
| Sex (%) | 0.080 | ||
| Boys | 5,291 (50.9) | 490 (53.9) | |
| Girls | 5,108 (49.1) | 419 (46.1) | |
| Age (months) | 128 (105,153) | 110 (94,132) | <0.001 |
| Waist-hip rate | 0.85 (0.79, 0.91) | 0.86 (0.81, 0.92) | <0.001 |
| BMI | 18.83 (16.14, 22.52) | 18.56 (15.86, 22.35) | 0.045 |
| Gestational age | 39 (38, 40) | 39 (38, 40) | 0.119 |
| Twins (%) | 0.554 | ||
| No | 10,137 (97.5) | 889 (97.8) | |
| Yes | 262 (2.5) | 20 (2.2) | |
| Chronic disease (%) | 0.839 | ||
| No | 10,259 (99.5) | 880 (99.4) | |
| Yes | 53 (0.5) | 5 (0.6) | |
| Number of dental caries (%) | <0.001 | ||
| 0 | 5,536 (53.2) | 383 (42.1) | |
| 1 | 1,412 (13.6) | 130 (14.3) | |
| 2 | 1,650 (15.9) | 165 (18.2) | |
| 3 | 730 (7.0) | 90 (9.9) | |
| 4 | 536 (5.2) | 67 (7.4) | |
| ≥5 | 535 (5.1) | 74 (8.1) | |
| Rhinitis (%) | <0.001 | ||
| No | 7,979 (76.7) | 461 (50.7) | |
| Yes | 2,420 (23.3) | 448 (49.3) | |
| Eczema (%) | <0.001 | ||
| No | 8,313 (79.9) | 577 (63.5) | |
| Yes | 2,086 (20.1) | 332 (36.5) | |
| Allergy (food/drug) (%) | <0.001 | ||
| No | 9,224 (88.7) | 721 (79.3) | |
| Yes | 1,175 (11.3) | 188 (20.7) | |
|
| |||
| Eating speed (minutes) | 16.67 (13.33, 20.00) | 16.67 (13.33, 21.67) | 0.035 |
| Fall asleep time (hours per day) | 10.00 (9.00, 10.00) | 10.00 (9.00, 10.00) | 0.002 |
| Sleep duration (hours per day) | 9.00 (8.29, 9.29) | 9.00 (8.29, 9.29) | 0.002 |
| Sitting duration (hours per day) | 5.71 (3.43, 7.43) | 5.43 (2.79, 7.00) | <0.001 |
| Screen time (hours per day) | 1.29 (0.64, 1.86) | 1.29 (0.79, 1.57) | 0.708 |
| Daily time of outdoor activities (hours per day) | 1.29 (1.00, 1.64) | 1.29 (1.00, 1.57) | 0.952 |
| Weekly intake frequency of dietary fiber (%) | 0.665 | ||
| Every day | 235 (2.3) | 16 (1.8) | |
| ≥3 times per week | 1,656 (15.9) | 154 (16.9) | |
| 1–2 times per week | 2,976 (28.6) | 262 (28.8) | |
| Hardly | 5,532 (53.2) | 477 (52.5) | |
| Weekly intake frequency of out-of-season fruit (%) | 0.281 | ||
| Every day | 1,427 (13.7) | 104 (11.4) | |
| ≥3 times per week | 3,697 (35.6) | 327 (36.0) | |
| 1–2 times per week | 2,877 (27.7) | 261 (28.7) | |
| Hardly | 2,398 (23.1) | 217 (23.9) | |
| Weekly intake frequency of animal protein (%) | 0.465 | ||
| Every day | 154 (1.5) | 8 (0.9) | |
| ≥3 times per week | 1,475 (14.2) | 133 (14.6) | |
| 1–2 times per week | 3,311 (31.8) | 298 (32.8) | |
| Hardly | 5,459 (52.5) | 470 (51.7) | |
| Weekly intake frequency of soy protein (%) | 0.115 | ||
| Every day | 820 (7.9) | 77 (8.5) | |
| ≥3 times per week | 4,212 (40.5) | 393 (43.2) | |
| 1–2 times per week | 2,851 (27.4) | 250 (27.5) | |
| Hardly | 2,516 (24.2) | 189 (20.8) | |
| Weekly intake frequency of milk (%) | 0.033 | ||
| Every day | 305 (2.9) | 36 (4.0) | |
| ≥3 times per week | 1,322 (12.7) | 104 (11.4) | |
| 1–2 times per week | 2,526 (24.3) | 250 (27.5) | |
| Hardly | 6,246 (60.1) | 519 (57.1) | |
| Weekly intake frequency of dietary supplement (%) | 0.120 | ||
| Every day | 8,592 (82.6) | 769 (84.6) | |
| ≥3 times per week | 990 (9.5) | 78 (8.6) | |
| 1–2 times per week | 355 (3.4) | 35 (3.9) | |
| Hardly | 462 (4.4) | 27 (3.0) | |
| Weekly intake frequency of food containing preservative (%) | 0.006 | ||
| Every day | 5,770 (55.5) | 460 (50.6) | |
| ≥3 times per week | 3,517 (33.8) | 335 (36.9) | |
| 1–2 times per week | 690 (6.6) | 81 (8.9) | |
| Hardly | 422 (4.1) | 33 (3.6) | |
| Weekly intake frequency of fast food (%) | 0.021 | ||
| Every day | 4,716 (45.4) | 367 (40.4) | |
| ≥3 times per week | 4,919 (47.3) | 464 (51.0) | |
| 1–2 times per week | 480 (4.6) | 53 (5.8) | |
| Hardly | 284 (2.7) | 25 (2.8) | |
| Weekly intake frequency of snacks (%) | 0.071 | ||
| Every day | 2,106 (20.3) | 162 (17.8) | |
| ≥3 times per week | 5,762 (55.4) | 502 (55.2) | |
| 1–2 times per week | 1,769 (17.0) | 182 (20.0) | |
| Hardly | 762 (7.3) | 63 (6.9) | |
| Weekly intake frequency of sweet food (%) | 0.048 | ||
| Every day | 2,091 (20.1) | 148 (16.3) | |
| ≥3 times per week | 5,947 (57.2) | 539 (59.3) | |
| 1–2 times per week | 1,774 (17.1) | 166 (18.3) | |
| Hardly | 587 (5.6) | 56 (6.2) | |
| Weekly intake frequency of night meals (%) | 0.027 | ||
| Every day | 5,507 (53.0) | 440 (48.4) | |
| ≥3 times per week | 2,942 (28.3) | 287 (31.6) | |
| 1–2 times per week | 1,068 (10.3) | 90 (9.9) | |
| Hardly | 882 (8.5) | 92 (10.1) | |
| Daily time of sleeping with the light on (%) | 0.259 | ||
| Every day | 9,004 (86.6) | 775 (85.3) | |
| ≥3 times per week | 586 (5.6) | 56 (6.2) | |
| 1–2 times per week | 260 (2.5) | 32 (3.5) | |
| Hardly | 549 (5.3) | 46 (5.1) | |
| Picky eating frequency per week (%) | 0.001 | ||
| Every day | 5,293 (50.9) | 407 (44.8) | |
| ≥3 times per week | 2,993 (28.8) | 281 (30.9) | |
| 1–2 times per week | 1,141 (11.0) | 109 (12.0) | |
| Hardly | 972 (9.3) | 112 (12.3) | |
| Daily time of using plastic tableware (%) | 0.154 | ||
| Every day | 6,842 (65.8) | 566 (62.3) | |
| ≥3 times per week | 2,063 (19.8) | 197 (21.7) | |
| 1–2 times per week | 631 (6.1) | 57 (6.3) | |
| Hardly | 863 (8.3) | 89 (9.8) | |
| Daily time of using make-up (%) | 0.876 | ||
| Every day | 9,564 (92.0) | 836 (92.0) | |
| ≥3 times per week | 466 (4.5) | 44 (4.8) | |
| 1–2 times per week | 141 (1.4) | 10 (1.1) | |
| Hardly | 228 (2.2) | 19 (2.1) | |
| Stool frequency (%) | 0.284 | ||
| 1–2 times per day | 7,700 (74.0) | 697 (76.7) | |
| 3–4 times per day | 345 (3.3) | 23 (2.5) | |
| ≥4 times per day | 323 (3.1) | 32 (3.5) | |
| 2–3 times per week | 1,719 (16.5) | 133 (14.6) | |
| 0 or once per week | 312 (3.0) | 24 (2.6) | |
| Stool consistency (%) | 0.022 | ||
| Separate hard lumps, like nuts | 227 (2.2) | 27 (3.0) | |
| Sausage-shaped but lumpy | 1,383 (13.3) | 145 (16.0) | |
| Like a sausage or snake but with cracks on its surface | 1,899 (18.3) | 175 (19.3) | |
| Like a sausage or snake, smooth and soft, fluffy pieces, watery | 6,890 (66.3) | 562 (61.8) | |
|
| |||
| Pregnancy order (%) | 0.779 | ||
| 1 | 6,773 (65.4) | 587 (64.7) | |
| ≥2 | 3,626(34.6) | 322(35.3) | |
| Delivery order (%) | 0.095 | ||
| 1 | 8,728 (84.2) | 782 (86.5) | |
| ≥2 | 1,671 (15.8) | 127 (13.5) | |
| Delivery mode (%) | 0.007 | ||
| Vaginal delivery | 4,996 (48.0) | 394 (43.3) | |
| Cesarean section | 5,403 (52.0) | 515 (56.7) | |
| Birth weight (g) | 3,369.62 (455.48) | 3,351.91 (453.43) | 0.283 |
| Birth body length (cm) | 50.77 (2.62) | 50.92 (2.58) | 0.096 |
| Infancy feeding (%) | 0.001 | ||
| Pure breastfeeding | 6,056 (58.2) | 471 (51.8) | |
| Partial breastfeeding | 3,177 (30.6) | 322 (35.4) | |
| Non-breastfeeding | 1,166 (11.2) | 116 (12.8) | |
| Breastfeeding duration | 8.00 (0.00, 12.00) | 6.00 (0.00, 13.00) | 0.055 |
| Time of adding solid-food | 6.00 (6.00, 7.00) | 6.00 (6.00, 7.00) | 0.318 |
|
| |||
| Paternal BMI | 25.62 (23.46, 27.78) | 25.83 (23.66, 27.78) | 0.333 |
| Maternal BMI | 22.86 (20.81, 25.39) | 22.86 (20.96, 25.81) | 0.134 |
| Bearing age of the father | 27.58 (25.67, 30.08) | 27.42 (25.58, 29.58) | 0.244 |
| Bearing age of the mother | 26.58 (24.50, 28.83) | 26.62 (24.75, 28.67) | 0.538 |
| Paternal age | 39.16 (4.28) | 37.95 (4.09) | <0.001 |
| Maternal age | 37.84 (4.04) | 36.80 (3.79) | <0.001 |
| Menarche | 13.54 (1.60) | 13.52 (1.59) | 0.74 |
| Maternal education (%) | <0.001 | ||
| Middle school degree or below | 1,661 (16.0) | 103 (11.3) | |
| High school degree | 3,060 (29.4) | 232 (25.5) | |
| College degree or above | 5,678 (54.6) | 574 (63.1) | |
| Paternal education (%) | <0.001 | ||
| Middle school degree or below | 1,692 (16.3) | 115 (12.7) | |
| High school degree | 3,778 (36.3) | 289 (31.8) | |
| College degree or above | 4,929 (47.4) | 505 (55.6) | |
| Family income (RMB per year) (%) | 0.010 | ||
| <100,000 | 4,888 (47.0) | 384 (42.2) | |
| 100,000–300,000 | 4,641 (44.6) | 453 (49.8) | |
| ≥300,000 | 870 (8.4) | 72 (7.9) | |
| Number of relatives with hypertension | 0.002 | ||
| 0 | 4,849 (46.6) | 364(40.0) | |
| 1 | 2,448 (23.5) | 226 (24.9) | |
| 2 | 1,888 (18.2) | 202 (22.2) | |
| 3 | 851 (8.2) | 81 (8.9) | |
| 4 | 363 (3.5) | 36 (4.0) | |
| Number of relatives with diabetes | <0.001 | ||
| 0 | 7,086 (68.1) | 549 (60.4) | |
| 1 | 2,447 (23.5) | 256 (28.2) | |
| 2 | 682 (6.6) | 84 (9.2) | |
| 3 | 140 (1.3) | 13 (1.4) | |
| 4 | 44 (0.4) | 7 (0.8) | |
Continuous data are expressed as mean (standard deviation) or median (interquartile range). Categorical data are expressed as count (percentage). For continuous data, P for comparison between children with lower respiratory tract infection and non-lower respiratory tract infection was derived by t test for normally distributed data, by rank-sum test for skewed data, and by χ.
BMI, body mass index.
Figure 1Hard and soft voting classifications based on 11 machine-learning algorithms for lower respiratory tract infection. The red solid circle represents the accuracy.
Prediction performance of 11 machine learning algorithms for lower respiratory tract infection using accuracy, precision, recall, F1 score and area under the receiver operating characteristic curve (AUROC).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Logistic regression | 0.922 | <0.001 | <0.001 | <0.001 | 0.710 |
| Decision tree | 0.850 | 0.120 | 0.148 | 0.133 | 0.528 |
| Support vector machine | 0.922 | <0.001 | <0.001 | <0.001 | 0.588 |
| Random forest | 0.922 | <0.001 | <0.001 | <0.001 | 0.654 |
| K-nearest neighbor | 0.922 | <0.001 | <0.001 | <0.001 | 0.514 |
| Gradient boosting machine | 0.921 | 0.143 | 0.003 | 0.006 | 0.682 |
| Extreme gradient boosting | 0.918 | 0.257 | 0.026 | 0.047 | 0.510 |
| Light gradient boosting machine | 0.920 | 0.083 | 0.003 | 0.006 | 0.643 |
| Gaussian naive Bayes | 0.856 | 0.140 | 0.165 | 0.151 | 0.652 |
| Multinomial naive Bayes | 0.922 | 1.000 | 0.003 | 0.006 | 0.663 |
| Bernoulli naive Bayes | 0.922 | 1.000 | 0.003 | 0.006 | 0.682 |
AUROC, area under the receiver operating characteristic curve.
Figure 2The ranking importance of top 20 factors for lower respiratory tract infection.
Distributions of areas under the receiver operating curve (AUROC), accuracy and precision with the cumulating number of top 10 important factors in an ascending order.
|
|
|
|
|
|---|---|---|---|
| 1 | 0.6527 | 0.9221 | <0.0001 |
| 2 | 0.6714 | 0.9221 | <0.0001 |
| 3 | 0.6795 | 0.8779 | 0.1428 |
| 4 | 0.6729 | 0.8896 | 0.1474 |
| 5 | 0.6914 | 0.8890 | 0.1559 |
| 6 | 0.6883 | 0.8846 | 0.1487 |
| 7 | 0.6859 | 0.8828 | 0.1523 |
| 8 | 0.6867 | 0.8830 | 0.1529 |
| 9 | 0.6858 | 0.8819 | 0.1472 |
| 10 | 0.6835 | 0.8806 | 0.1518 |
AUROC, area under the receiver operating characteristic curve.
Model loss and accuracy for deep-learning sequential model using three optimizers in both training and testing groups.
|
|
|
| ||
|---|---|---|---|---|
|
|
|
|
| |
|
| ||||
| Adam | 23.04% | 91.94% | 27.62% | 92.26% |
| RMSprop | 24.89% | 91.92% | 26.68% | 92.29% |
| SGD | 26.05% | 91.77% | 25.62% | 92.26% |
|
| ||||
| Adam | 25.94% | 91.96% | 25.61% | 92.22% |
| RMSprop | 27.60% | 91.50% | 25.65% | 92.22% |
| SGD | 26.52% | 91.75% | 25.52% | 92.22% |
Adam, adaptive moment estimation; RMSprop, root mean square prop; SGD, stochastic gradient descent.