| Literature DB >> 36177329 |
Mingjuan Zhou1, Tianci Yao2, Jian Li3, Hui Hui4, Weimin Fan1, Yunfeng Guan5, Aijun Zhang1, Bufang Xu1,6.
Abstract
Introduction: Semen quality has decreased gradually in recent years, and lifestyle changes are among the primary causes for this issue. Thus far, the specific lifestyle factors affecting semen quality remain to be elucidated. Materials and methods: In this study, data on the following factors were collected from 5,109 men examined at our reproductive medicine center: 10 lifestyle factors that potentially affect semen quality (smoking status, alcohol consumption, staying up late, sleeplessness, consumption of pungent food, intensity of sports activity, sedentary lifestyle, working in hot conditions, sauna use in the last 3 months, and exposure to radioactivity); general factors including age, abstinence period, and season of semen examination; and comprehensive semen parameters [semen volume, sperm concentration, progressive and total sperm motility, sperm morphology, and DNA fragmentation index (DFI)]. Then, machine learning with the XGBoost algorithm was applied to establish a primary prediction model by using the collected data. Furthermore, the accuracy of the model was verified via multiple logistic regression following k-fold cross-validation analyses.Entities:
Keywords: artificial intelligence; extreme gradient boosting (XGBoost); lifestyles; machine learning; semen quality
Year: 2022 PMID: 36177329 PMCID: PMC9514383 DOI: 10.3389/fmed.2022.811890
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1Flow chart of the study population. A total of 5,109 males were included in this study; all of these participants underwent a routine seminal assay while some also underwent sperm morphology and DNA Fragmentation Index assay according to the different inspection purposes.
Distribution of male participants whose data were used for machine learning and the hyperparameters for XGBoost.
|
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|---|
| Semen volume | 5,109 | 3,576 | 1,533 | 0.01 | 600 | 3 | 1 | 1 |
| Sperm concentration | 5,109 | 3,576 | 1,533 | 0.01 | 750 | 3 | 1 | 1 |
| Progressive motility | 5,109 | 3,576 | 1,533 | 0.01 | 600 | 3 | 1 | 1 |
| Total motility | 5,109 | 3,576 | 1,533 | 0.01 | 600 | 5 | 1 | 1 |
| Sperm morphology | 2,511 | 1,758 | 754 | 0.01 | 300 | 4 | 1 | 1 |
| DFI | 1,812 | 1,268 | 544 | 0.01 | 300 | 4 | 1 | 1 |
Figure 2Information regarding the general and lifestyle characteristics of study participants.
Outcomes of machine learning using XGboost.
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| Semen volume | 0.7025 | 63 | 71 | 385 | 1,014 | 0.7248 | 0.4701 | 0.9346 | 0.1406 |
| Sperm concentration | 0.6758 | 94 | 87 | 410 | 942 | 0.6967 | 0.5193 | 0.9155 | 0.1865 |
| Progressive motility | 0.6282 | 269 | 147 | 423 | 694 | 0.6213 | 0.6466 | 0.8252 | 0.3887 |
| Total motility | 0.6067 | 218 | 157 | 446 | 712 | 0.6149 | 0.5813 | 0.8193 | 0.3283 |
| DFI | 0.6838 | 331 | 139 | 33 | 41 | 0.5541 | 0.7043 | 0.2278 | 0.9093 |
| Sperm morphology | 0.6167 | 55 | 83 | 206 | 410 | 0.6656 | 0.3986 | 0.8316 | 0.2107 |
Figure 3XGBoost and logistic regression analysis of the risk factors for semen volume. The ROC curve (A) and feature importance (B) analyzed by XGBoost and the ROC curve (C) and forest diagram showing significant risk factors (D) analyzed by logistic regression.
Figure 4XGBoost and logistic regression analysis of the risk factors for sperm concentration. The ROC curve (A) and feature importance (B) analyzed by XGBoost and the ROC curve (C) and forest diagram showing significant risk factors (D) analyzed by logistic regression.
Figure 5XGBoost and logistic regression analysis of the risk factors for progressive sperm motility. The ROC curve (A) and the feature importance (B) analyzed by XGBoost and the ROC curves (C) and forest diagram showing significant risk factors (D) analyzed by logistic regression.
Figure 6XGBoost and logistic regression analysis of the risk factors for total sperm motility. The ROC curve (A) and the feature importance (B) analyzed by XGBoost and the ROC curve (C) and forest diagram showing significant risk factors (D) analyzed by logistic regression.
Figure 7XGBoost and logistic regression analysis of the risk factors for sperm morphology. The ROC curve (A) and feature importance (B) analyzed by XGBoost and the ROC curve (C) and forest diagram showing significant risk factors (D) analyzed by logistic regression.
Figure 8XGBoost and logistic regression analysis of the risk factors for the DNA fragmentation index (DFI). The ROC curve (A) and feature importance (B) analyzed by XGBoost and the ROC curve (C) and forest diagram showing significant risk factors (D) analyzed by logistic regression.