| Literature DB >> 33870846 |
Mingjie Wang1,2, Li Chen2, MinHui Dong3, Jing Li1, Beidi Zhu3, Zhitao Yang4, Qiming Gong5, Yue Han1, Demin Yu1, Donghua Zhang1, Fabien Zoulim6, Jiming Zhang3, Xinxin Zhang1.
Abstract
Few non-invasive models were established for precisely identifying the immune tolerant (IT) phase from chronic hepatitis B (CHB). This study aimed to develop a novel approach that combined next-generation sequencing (NGS) and machine learning algorithms using our recently published viral quasispecies (QS) analysis package. 290 HBeAg positive patients from whom liver biopsies were taken were enrolled and divided into a training group (n = 148) and a validation group (n = 142). HBV DNA was extracted and QS sequences were obtained by NGS. Hierarchical clustering analysis (HCA) and principal component analysis (PCA) based on viral operational taxonomic units (OTUs) were performed to explore the correlations among QS and clinical phenotypes. Three machine learning algorithms, including K-nearest neighbour, support vector machine, and random forest algorithm, were used to construct diagnostic models for IT phase classification. Based on histopathology, 90 IT patients and 200 CHB patients were diagnosed. HBsAg titres for IT patients were higher than those of CHB patients (p < 0.001). HCA and PCA analysis grouped IT and CHB patients into two distinct clusters. The relative abundance of viral OTUs differed mainly within the BCP/precore/core region and was significantly correlated with liver inflammation and fibrosis. For the IT phase classification, all machine-learning models showed higher AUC values compared to models based on HBsAg, APRI, and FIB-4. The relative abundance of viral OTUs reflects the severity of liver inflammation and fibrosis. The novel QS quantitative analysis approach could be used to diagnose IT patients more precisely and reduce the need for liver biopsy.Entities:
Keywords: Chronic hepatitis B; clinical pathology; decision support techniques; machine learning; natural history; quasispecies
Mesh:
Substances:
Year: 2021 PMID: 33870846 PMCID: PMC8812768 DOI: 10.1080/22221751.2021.1919033
Source DB: PubMed Journal: Emerg Microbes Infect ISSN: 2222-1751 Impact factor: 7.163
Figure 1.(A) Flowchart of patient enrolment in the study. (B) A schematic diagram of the experiment and data analysis workflow. Briefly, HBV genome DNA was extracted from serum and amplified by 9 pairs of primers, then detected by NGS which generated a pooled sequencing reads of different viral strains. QS were then quantified based on the abundances of viral OTUs, and clustered using HCA and PCA. Finally, classification models were constructed using machine learning algorithms based on sample clusters.
The clinical characteristics of the study patients in the IT and CHB group.
| IT group ( | CHB group ( | ||
|---|---|---|---|
| Sex (Male, %) | 80 (88.89) | 152 (76.00) | 0.02 |
| Age (years) | 32.83 ± 9.18 | 36.75 ± 10.68 | <0.01 |
| ALT (IU/L) | 47.00 (29.50–70.50) | 61.00 (40.50–110.50) | <0.01 |
| AST (IU/L) | 29.50 (23.00–39.25) | 40.50 (30.25–63.75) | <0.01 |
| PLT (109/L) | 202.96 ± 50.99 | 176.50 ± 54.94 | <0.01 |
| HBV DNA (log10IU/ml) | 7.56 ± 0.52 | 6.59 ± 1.29 | <0.01 |
| HBsAg (log10IU/ml) | 4.60 ± 0.66 | 3.90 ± 0.77 | <0.01 |
| Genotype B/C (n) | 38/52 | 78/122 | 0.75 |
| G0/G1/G2/G3/G4(n) | 30/60/0/0/0 | 4/26/112/47/11 | <0.01 |
| S0/S1/S2/S3/S4(n) | 55/35/0/0/0 | 2/48/86/31/33 | <0.01 |
Figure 2.Scatter plots of PCA results of 9 amplicons in the training group. (A∼I) corresponds to amplicon P1 to P9 (amplified by primers P1 to P9, respectively). Each dot in the plot represents a sample, of which dots in the red represent CHB patients and dots in the blue represent IT patients. The x-axis and y-axis represent the top 2 principal components (PC), PC1 and PC2, respectively.
Figure 3.HCA and PCA based on viral OTUs of amplicon P5 in the training group. (A) Hierarchal clustering heatmap of viral OTUs. A column corresponds to viral OTUs within a patient, and a row corresponds to the relative abundance of a representative OTU in all patients. The colours corresponding to the scales bars and traits are shown on the left. (B) Principal component analysis of viral OTUs in 148 patients in the training group. PC1 and PC2 were used as x-axis and y-axis in two dimensions, respectively. Each dot represents one sample, and the colours indicate different groups.
Statistical significance of associations between PCs and clinical profiles.
| Phenotype | PC1 | PC2 | ||
|---|---|---|---|---|
| Correlation | Correlation | |||
| Gender (Male/Female) | 3.40E−02 | 0.07 | 2.74E−01 | 0.02 |
| Age (years) | 1.06E−01 | 0.10 | 6.32E−01 | 0.03 |
| ALT (IU/L) | 5.67E−01 | 0.03 | 2.97E−04 | 0.21 |
| AST (IU/L) | 2.20E−02 | 0.14 | 1.38E−05 | 0.26 |
| PLT(109/L) | 4.86E−02 | −0.12 | 1.94E−03 | −0.19 |
| HBV DNA (log10IU/mL) | 4.25E−07 | −0.30 | 1.96E−01 | −0.08 |
| HBsAg (log10IU/mL) | 1.81E−08 | −0.33 | 1.53E−05 | −0.26 |
| Inflammation grade (G) | 2.98E−10 | 0.42 | 8.19E−06 | 0.38 |
| Fibrosis stage(S) | 6.47E−10 | 0.43 | 2.01E−05 | 0.34 |
Comparison of the performance between diagnostic models and clinical parameters in identifying the IT or CHB patients.
| SVM | KNN | RF | HBsAg | ALT | FIB-4 | APRI | ||
|---|---|---|---|---|---|---|---|---|
| Training group | SPEC | 0.9689 | 0.9689 | 0.9678 | 0.8218 | 0.6602 | 0.4353 | 0.7326 |
| SENS | 1.0000 | 0.9598 | 0.9670 | 0.9211 | 0.5854 | 0.7857 | 0.6552 | |
| ACC | 0.9817 | 0.9657 | 0.9675 | 0.8489 | 0.6389 | 0.5221 | 0.7130 | |
| AUC | 0.9845 | 0.9652 | 0.9681 | 0.8876 | 0.6153 | 0.5576 | 0.7153 | |
| Validation group | SPEC | 0.9349 | 0.9411 | 0.9446 | 0.6744 | 0.4255 | 0.6484 | 0.6813 |
| SENS | 0.8327 | 0.8192 | 0.8141 | 0.7447 | 0.8958 | 0.6905 | 0.7143 | |
| ACC | 0.9031 | 0.9032 | 0.9040 | 0.6992 | 0.5845 | 0.6617 | 0.6917 | |
| AUC | 0.8838 | 0.8801 | 0.8793 | 0.6759 | 0.6806 | 0.7033 | 0.7276 |
SPEC, specificity; SENS, sensitivity; ACC, accuracy; AUC, area under ROC curve; SVM, support vector machine; KNN, K-nearest neighbour; RF, random forest; FIB-4, fibrosis-4 index; APRI, AST-to-platelet ratio index.
Figure 4.ROC curves of three diagnostic models using machine learning methods compared with HBsAg and ALT level in identifying IT and CHB patients. (A∼C) ROC curves of three diagnostic models constructed using SVM, KNN and RF methods in the training group. (D∼F) ROC curves of three diagnostic models constructed by using SVM, KNN and RF methods in the validation group. The coloured ribbon corresponds to the 95% CI of ROC curves. SVM, support vector machine; KNN, K-nearest neighbour; RF, random forest; FIB-4, fibrosis-4 index; APRI, AST-to-platelet ratio index.