| Literature DB >> 34233610 |
Canbiao Wu1, Xiaofang Guo2, Mengyuan Li3, Jingxian Shen1, Xiayu Fu4, Qingyu Xie1,5, Zeliang Hou1, Manman Zhai1,6, Xiaofan Qiu1, Zifeng Cui3, Hongxian Xie7, Pengmin Qin6, Xuchu Weng1,8, Zheng Hu9,10, Jiuxing Liang11,12.
Abstract
BACKGROUND: The hepatitis B virus (HBV) is one of the main causes of viral hepatitis and liver cancer. HBV integration is one of the key steps in the virus-promoted malignant transformation.Entities:
Keywords: Bioinformatics; Deep learning; Genomic features; HBV integration sites
Year: 2021 PMID: 34233610 PMCID: PMC8261932 DOI: 10.1186/s12862-021-01869-8
Source DB: PubMed Journal: BMC Ecol Evol ISSN: 2730-7182
Fig. 1The deep learning framework applied in DeepHBV. (a) Scheme of encoding a 2 kb DNA sequence into a binary matrix using one-hot code; (b) A brief flowchart of DeepHBV structure, the matrix shape was included in brackets, and a detailed flowchart was in Supplementary Figure 1
Fig. 2Evaluation of DeepHBV model prediction performance on the test dataset: (a) receiver-operating characteristic (ROC) curves and (b) precision recall (PR) curves, respectively
The testing results of DNA sequence samples of 2000 bp length
| Fold No | Loss | Accuracy | Sensitivity | Specificity | AUROC | AUPR | F1-score | MCC |
|---|---|---|---|---|---|---|---|---|
| 1 | 1.1799 | 0.7518 | 0.7919 | 0.7452 | 0.7204 | 0.6365 | 0.4752 | 0.3983 |
| 2 | 1.2281 | 0.7473 | 0.7892 | 0.7407 | 0.7084 | 0.6293 | 0.4585 | 0.3855 |
| 3 | 1.1282 | 0.7451 | 0.7799 | 0.7396 | 0.7014 | 0.6198 | 0.4534 | 0.3780 |
| 4 | 1.2226 | 0.7146 | 0.7344 | 0.7125 | 0.6588 | 0.5595 | 0.3326 | 0.2809 |
| 5 | 1.0598 | 0.7178 | 0.7270 | 0.7167 | 0.6627 | 0.5574 | 0.3553 | 0.2915 |
| 6 | 1.1714 | 0.7279 | 0.7812 | 0.7217 | 0.6689 | 0.5720 | 0.3749 | 0.3269 |
| 7 | 1.1656 | 0.7425 | 0.7538 | 0.7406 | 0.7027 | 0.6195 | 0.4582 | 0.3695 |
| 8 | 1.3303 | 0.7486 | 0.8125 | 0.7393 | 0.7104 | 0.6309 | 0.4508 | 0.3905 |
| 9 | 0.7810 | 0.7241 | 0.7294 | 0.7234 | 0.6442 | 0.5477 | 0.3876 | 0.3124 |
| 10 | 1.0883 | 0.7483 | 0.7957 | 0.7409 | 0.7232 | 0.6389 | 0.4580 | 0.3881 |
| Mean | 1.1355 | 0.7368 | 0.7695 | 0.7321 | 0.6901 | 0.6012 | 0.4204 | 0.3521 |
| SD | 0.1385 | 0.0134 | 0.0293 | 0.0114 | 0.0271 | 0.0352 | 0.0494 | 0.0424 |
AUROC area under receiver operating characteristic curve; AUPR area under precision-recall curve; MCC Mathews’ correlation coefficient
Fig. 3The attention weight distribution of analysed by DeepHBV with HBV integration sequences + genomic features. (a) DeepHBV with HBV integration sequences + TCGA Pan Cancer peaks; (b) DeepHBV with HBV integration sequences + repeat peaks. The left graph showed the fractions of attention weight, which were averaged among all samples and normalized to the average of all positions, each index represents a 3 bp region due to the multiple convolution and pooling operation. The graphs on the right are representative samples of attention weight distribution of positive samples and negative samples
Fig. 4Attention intensive regions highlighted essential local genomic features on predicting HBV integration sites. Representative examples showed the positional relationship between the attention intensive sites and several genomic features using DeepHBV with HBV integration sequences + TCGA Pan Cancer model on (a) chr5:1,294,063-1,296,063 (hg38), (b) chr5: 1291277-1293277 (hg38)
Enriched TFBS from attention intensive regions of DeepHBV with HBV integration sites + TCGA Pan Cancer peaks
| HOMER known results | HOMER de novo results | ||||
|---|---|---|---|---|---|
| Rank | Name | P-value | Rank | Best match/details | P-value |
| 1 | BMAL1 | 1E−323 | 1 | TEAD3 | 1E−2283 |
| 2 | NPAS | 1.00E−259 | 2 | EBF1 | 1E−1926 |
| 3 | CLOCK | 1.00E−165 | 3 | TCF7 | 1E−958 |
| 4 | c−Myc | 1.00E−126 | 4 | GRHL2 | 1E−504 |
| 5 | ZFX | 1.00E−108 | 5 | Dux | 1E−477 |
| 6 | Tgif2 | 1.00E−75 | 6 | Ptf1a | 1E−465 |
| 7 | MNT | 1.00E−71 | 7 | TEAD | 1E−385 |
| 8 | LRF | 1.00E−62 | 8 | Ahr::Arnt | 1.00E−302 |
| 9 | Tbx5 | 1.00E−62 | 9 | Sox5 | 1.00E−245 |
| 10 | ZNF711 | 1.00E−57 | 10 | TEAD | 1.00E−233 |
| 11 | n-Myc | 1.00E−54 | 11 | Zic2 | 1.00E−204 |
| 12 | ZNF416 | 1.00E−52 | 12 | Nr2e3 | 1.00E−197 |
| 13 | USF1 | 1.00E−47 | 13 | SOX18 | 1.00E−182 |
| 14 | bHLHE40 | 1.00E−45 | 14 | ZBTB14 | 1.00E−174 |
| 15 | Rbpj1 | 1.00E−36 | 15 | USF2 | 1.00E−153 |
| 16 | Zac1 | 1.00E−35 | 16 | Isl1 | 1.00E−142 |
| 17 | Tgif1 | 1.00E−32 | 17 | ZNF264 | 1.00E−142 |
| 18 | ZEB1 | 1.00E−30 | 18 | Ascl2 | 1.00E−133 |
| 19 | THRb | 1.00E−29 | 19 | ZNF460 | 1.00E−120 |
| 20 | Ptf1a | 1.00E−29 | 20 | LRF | 1.00E−117 |
| 21 | bHLHE41 | 1.00E−29 | 21 | ZNF416 | 1.00E−117 |
| 22 | TEAD1 | 1.00E−27 | 22 | PKNOX1 | 1.00E−103 |
| 23 | Stat3 | 1.00E−24 | 23 | Bcl6b | 1.00E−91 |
| 24 | Meis1 | 1.00E−21 | 24 | Arnt | 1.00E−90 |
| 25 | c-Myc | 1.00E−21 | 25 | Osr2 | 1.00E−88 |
| 26 | Usf2 | 1.00E−20 | 26 | TFAP2A | 1.00E−79 |
| 27 | NPAS2 | 1.00E−17 | |||
| 28 | HIC1 | 1.00E−17 | |||
| 29 | TEAD | 1.00E−17 | |||
| 30 | TEAD4 | 1.00E−16 | |||
| 31 | AR-halfsite | 1.00E−16 | |||
| 32 | STAT6 | 1.00E−15 | |||
| 33 | TCF4 | 1.00E−13 | |||
| 34 | MITF | 1.00E−13 | |||
| 35 | TEAD3 | 1.00E−13 | |||
| 36 | Atf1 | 1.00E−12 | |||
| 37 | HIF-1b | 1.00E−11 | |||
| 38 | Foxo3 | 1.00E−10 | |||
| 39 | E2A | 1.00E−09 | |||
| 40 | TEAD2 | 1.00E−09 | |||
| 41 | Mef2a | 1.00E−08 | |||
| 42 | ZNF692 | 1.00E−07 | |||
| 43 | Nkx3.1 | 1.00E−07 | |||
| 44 | COUP-TFII | 1.00E−07 | |||
| 45 | MyoG | 1.00E−07 | |||
| 46 | Nkx2.5 | 1.00E−06 | |||
| 47 | Snail1 | 1.00E−05 | |||
| 48 | HEB | 1.00E−05 | |||
| 49 | Tbx6 | 1.00E−05 | |||
| 50 | SCRT1 | 1.00E−04 | |||
| 51 | Nr5a2 | 1.00E−04 | |||
| 52 | Nanog | 1.00E−03 | |||
| 53 | Oct11 | 1.00E−03 | |||
| 54 | Elk1 | 1.00E−03 | |||
| 55 | Erra | 1.00E−03 | |||
| 56 | Gata6 | 1.00E−03 | |||
| 57 | BHLHA15 | 1.00E−03 | |||
| 58 | AMYB | 1.00E−03 | |||
| 59 | Nr5a2 | 1.00E−03 | |||
| 60 | NFkB-p65-Rel | 1.00E−02 | |||
| 61 | Zic | 1.00E−02 | |||
| 62 | TRPS1 | 1.00E−02 | |||
| 63 | Hoxa9 | 1.00E−02 | |||
| 64 | HIF2a | 1.00E−02 | |||
| 65 | Isl1 | 1.00E−02 | |||
| 66 | CEBP:AP1 | 1.00E−02 | |||
| 67 | EWS:FLI1-fusion | 1.00E−02 | |||
| 68 | FOXK1 | 1.00E−02 | |||
| 69 | ETS | 1.00E−02 | |||