| Literature DB >> 31281411 |
Xiaolu Tian1, Yutian Chong2, Yutao Huang3, Pi Guo4, Mengjie Li1, Wangjian Zhang5, Zhicheng Du1, Xiangyong Li2, Yuantao Hao1,6.
Abstract
Hepatitis B surface antigen (HBsAg) seroclearance during treatment is associated with a better prognosis among patients with chronic hepatitis B (CHB). Significant gaps remain in our understanding on how to predict HBsAg seroclearance accurately and efficiently based on obtainable clinical information. This study aimed to identify the optimal model to predict HBsAg seroclearance. We obtained the laboratory and demographic information for 2,235 patients with CHB from the South China Hepatitis Monitoring and Administration (SCHEMA) cohort. HBsAg seroclearance occurred in 106 patients in total. We developed models based on four algorithms, including the extreme gradient boosting (XGBoost), random forest (RF), decision tree (DCT), and logistic regression (LR). The optimal model was identified by the area under the receiver operating characteristic curve (AUC). The AUCs for XGBoost, RF, DCT, and LR models were 0.891, 0.829, 0.619, and 0.680, respectively, with XGBoost showing the best predictive performance. The variable importance plot of the XGBoost model indicated that the level of HBsAg was of high importance followed by age and the level of hepatitis B virus (HBV) DNA. Machine learning algorithms, especially XGBoost, have appropriate performance in predicting HBsAg seroclearance. The results showed the potential of machine learning algorithms for predicting HBsAg seroclearance utilizing obtainable clinical data.Entities:
Year: 2019 PMID: 31281411 PMCID: PMC6594274 DOI: 10.1155/2019/6915850
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Summary of participant's characteristics.
| Variables | Value |
|---|---|
| Age (years)a | 40.58 ± 12.07 |
| Gender (male)b | 1636 (73.2) |
| BMIa | 22.53 ± 3.96 |
| Drinking historyb | 256 (11.5) |
| HBV family historyb | 1350 (60.4) |
| HCC family historyb | 188 (8.4) |
| Initial diagnosisb | |
| Inactive hepatitis B virus carrier | 12 (0.5) |
| Chronic hepatitis B | 1966 (88.0) |
| Hepatitis cirrhosis | 216 (9.7) |
| Hepatocellular carcinoma | 41 (1.8) |
| Current diagnosisb | |
| Hepatitis B virus carrier | 13 (0.5) |
| Chronic hepatitis B | 1875 (83.9) |
| Hepatitis cirrhosis | 222 (9.9) |
| Hepatocellular carcinoma | 125 (5.6) |
| ALTa (U/L) | 95.91 ± 167.82 |
| ASTa (U/L) | 144.68 ± 280.77 |
| GGTa (U/L) | 59.32 ± 79.35 |
| PLTa (U/L) | 175.19 ± 67.53 |
| ALBa (g/L) | 44.61 ± 5.37 |
| TBILa ( | 25.40 ± 52.83 |
| DBILa ( | 11.17 ± 41.19 |
| PLTa (×109/L) | 175.19 ± 67.53 |
| DNAa (log/IU/mL) | 5.57 ± 2.05 |
| sAga (log/IU/mL) | 3.42 ± 0.81 |
| eAga (log/IU/mL) | 0.71 ± 1.63 |
| WBCa (×109/L) | 6.03 ± 1.93 |
| HBa (g/L) | 140.66 ± 34.48 |
| RLODa (mm) | 114.85 ± 27.92 |
| PVWa (mm) | 11.37 ± 4.17 |
| SLa (mm) | 102.74 ± 21.19 |
| SPVWa (mm) | 6.17 ± 4.39 |
| Initial treatmentb | |
| None | 874 (39.1) |
| LMV | 248 (11.1) |
| ADV | 277 (12.4) |
| LdT | 111 (5.0) |
| ETV | 610 (27.3) |
| TDF | 62 (2.8) |
| LMV + ADV | 47 (2.1) |
| LdT + ADV | 4 (0.2) |
| ETV + ADV | 2 (0.1) |
| Linesb | |
| 0 | 1035 (46.3) |
| 1 | 818 (36.6) |
| 2 | 211 (9.4) |
| 3 | 93 (4.2) |
| 4 | 46 (2.1) |
| 5 | 18 (0.8) |
| 6 | 10 (0.5) |
| 7 | 1 (0.0) |
| 8 | 2 (0.1) |
| 9 | 1 (0.0) |
| Current treatmentb | |
| None | 1019 (45.6) |
| LMV | 68 (3.0) |
| ADV | 152 (6.8) |
| LdT | 26 (1.2) |
| ETV | 61 (27.4) |
| TDF | 252 (11.3) |
| LMV + ADV | 79 (3.5) |
| LdT + ADV | 6 (0.3) |
| ETV + ADV | 21 (0.9) |
| IFNb | 115 (5.1) |
| VRb | |
| IVR | 332 (14.9) |
| EVR | 976 (43.7) |
| SOR | 976 (41.5) |
aMean and standard deviation; bfrequencies and percentages; VR: virological response; IVR: initial virological response; EVR: early virological response; SOR: suboptimal virological response.
Summary of parameter values in each model for predicting HBsAg seroclearance.
| Model | Parameter | Value |
|---|---|---|
| Extreme gradient boosting | n_estimators | 153 |
| max_depth | 4 | |
| min_child_weight | 2 | |
| Subsample | 0.5 | |
| colsample_bytree | 0.8 | |
| colsample_bylevel | 0.8 | |
| reg_alpha | 2.0 | |
| reg_lambda | 0.3 | |
|
| ||
| Random forest | max_features | Auto |
| min_samples_leaf | 1 | |
| n_estimators | 40 | |
|
| ||
| Decision tree | max_depth | 29 |
| max_features | log2 | |
| min_samples_leaf | 23 | |
|
| ||
| Logistic regression | C | 0.001 |
| Penalty | L1 | |
Summary of predictive performance of each model.
| Model | TP | FN | TN | FP | Precision | Sensitivity |
| AUC (95% CI) |
|---|---|---|---|---|---|---|---|---|
| Logistic regression | 0 | 35 | 636 | 0 | 1.00 | 0.95 | 0.97 | 0.680 (0.677, 0.683) |
| Decision tree | 4 | 31 | 627 | 9 | 0.97 | 0.94 | 0.95 | 0.619 (0.614, 0.624) |
| Random forest | 4 | 31 | 635 | 1 | 0.99 | 0.95 | 0.97 | 0.829 (0.824, 0.834) |
| Extreme gradient boosting | 9 | 26 | 632 | 4 | 0.98 | 0.96 | 0.97 | 0.891 (0.889, 0.895) |
Figure 1Receiver operating characteristic curves of logistic regression.
Figure 2Receiver operating characteristic curves of decision tree.
Figure 3Receiver operating characteristic curves of random forest.
Figure 4Receiver operating characteristic curves of extreme gradient boosting.
Figure 5Variable importance plot of the XGBoost model for predicting HBsAg seroclearance.