| Literature DB >> 29244000 |
James Lara1, Mahder Teka2, Yury Khudyakov2.
Abstract
BACKGROUND: Identification of acute or recent hepatitis C virus (HCV) infections is important for detecting outbreaks and devising timely public health interventions for interruption of transmission. Epidemiological investigations and chemistry-based laboratory tests are 2 main approaches that are available for identification of acute HCV infection. However, owing to complexity, both approaches are not efficient. Here, we describe a new sequence alignment-free method to discriminate between recent (R) and chronic (C) HCV infection using next-generation sequencing (NGS) data derived from the HCV hypervariable region 1 (HVR1).Entities:
Mesh:
Substances:
Year: 2017 PMID: 29244000 PMCID: PMC5731502 DOI: 10.1186/s12864-017-4269-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Differences in the population means of DNA nt and PhyChem features of HVR1 and correlation to the R/C classes§
| Featuresa | t-value ( | Means in R/C | Difference in means (95% C.I.) | R-value (95% C.I.) |
|---|---|---|---|---|
| Nt A | 47.86 (<2.20X10−16) | 0.162/0.181 | 0.019 (0.019, 0.020) | 0.497 (0.477, 0.516) |
| Nt G | 28.64 (<2.20X10−16) | 0.322/0.313 | 0.009 (0.008, 0.010) | 0.346 (0.323, 0.367) |
| Nt C | 24.26 (<2.20X10−16) | 0.294/0.286 | 0.008 (0.007, 0.008) | 0.332 (0.309, 0.355) |
| Nt T | 9.61 (<2.20X10−16) | 0.218/0.215 | 0.003 (0.002, 0.003) | 0.138 (0.112, 0.163) |
| Twist-tilt | 43.39 (<2.20X10−16) | 0.006/−0.010 | 0.016 (0.015, 0.017) | 0.539 (0.520, 0.557) |
| Slide-rise | 42.22 (<2.20X10−16) | −0.037/−0.058 | 0.021 (0.020, 0.022) | 0.500 (0.480, 0.519) |
| Enthalpy | 41.01 (<2.20X10−16) | −0.206/−0.250 | 0.044 (0.041, 0.045) | 0.497 (0.477, 0.516) |
| Breslauer-dH | 41.01 (<2.20X10−16) | −0.184/−0.231 | 0.047 (0.044, 0.048) | 0.494 (0.474, 0.513) |
| Breslauer-dG | 40.17 (<2.20X10−16) | −0.298/−0.273 | 0.025 (0.024, 0.026) | 0.477 (0.457, 0.497) |
| Protein-DNA twist | 37.96 (<2.20X10−16) | −0.326/−0.381 | 0.055 (0.051, 0.057) | 0.472 (0.451, 0.492) |
| Slide-2 | 36.90 (<2.20X10−16) | −0.380/−0.448 | 0.068 (0.064, 0.072) | 0.471 (0.450, 0.490) |
| SE-ZDNAc | 36.46 (<2.20X10−16) | −0.264/−0.313 | 0.049 (0.045, 0.050) | 0.468 (0.447, 0.488) |
| Twist-1 | 36.91 (<2.20X10−16) | −0.302/−0.358 | 0.056 (0.052, 0.058) | 0.462 (0.442, 0.483) |
| G-content | 37.79 (<2.20X10−16) | −0.375/−0.434 | 0.059 (0.056, 0.062) | 0.457 (0.436, 0.477) |
| Helix coil transition | 34.05 (<2.20X10−16) | −0.285/−0.350 | 0.065 (0.062, 0.070) | 0.455 (0.434, 0.475) |
| MGDd | 35.72 (<2.20X10−16) | 0.321/0.353 | 0.032 (0.030, 0.033) | 0.454 (0.433, 0.475) |
| Sugimoto_dG | 37.43 (<2.20X10−16) | 0.502/0.462 | 0.040 (0.037, 0.042) | 0.450 (0.429, 0.470) |
| Sugimoto_dS | 38.17 (<2.20X10−16) | 0.520/0.475 | 0.045 (0.043, 0.048) | 0.450 (0.429, 0.471) |
| Propeller twist | 34.58 (<2.20X10−16) | 0.196/0.148 | 0.048 (0.045, 0.050) | 0.448 (0.427, 0.469) |
athe four DNA-specific nt’s and the 15 DNA-specific PhyChem properties of HVR1 sequences with R-values ≥0.5 are shown. Detailed description of the DNA PhyChem features used herein is available in [6, 7]
b p-value is the same for the Welch two sample t-test and Pearson’s product-moment correlation test
cabbreviation for: Stabilizing Energy of Z DNA
dabbreviation for: Minor Groove Distance
R-values, t-values and differences in means are reported as absolute values
Fig. 1Distribution of HVR1 variants by DNA nt frequency. Shown are binned plots of 5681 HVR1 PhyChem variants derived from 222 patients. Y-axis denote fraction (percent) of variants with same occurrence frequency (x-axis) for: (a) nt A, (b) nt G, (c) nt C and (d) nt T. R- and C-associated variants are denoted in red and blue bars, respectively
Fig. 2Distribution of HVR1 variants by DNA PhyChem property. Shown are binned plots of 5681 HVR1 PhyChem variants derived from 222 patients. Y-axis denotes fraction (percent) of variants with same range of values (x-axis) for PhyChem indexes: (a) Twist_tilt, (b) Slide_rise, (c) Enthalpy, (d) Breslauer_dH and (e) Sugimoto_dH. The Sugimoto_dH index illustrates an example of a DNA PhyChem property found to have small but significant correlation (r = 0.102; p < 1.38 × 10−14) to the R/C classes. R- and C-associated variants are denoted in red and blue, respectively
Fig. 3Distribution of HVR1 variants in pairwise DNA PhyChem property plots. Shown are two-dimensional (2D) plots of 5681 HVR1 PhyChem variants derived from 222 patients. The x-axis represents the range of values of the PhyChem indices: (a) Breslauer_dH; (b) Enthalpy; (c) Slide_rise, and (d) Sugimoto_dH. Y-axis denotes range of values for the Twist-tilt PhyChem index. R- and C-associated variants are denoted in red and blue, respectively
Fig. 4Spatial distribution of HVR1 variants in a 2D MDS plot. Sammon mapping of 5681 HVR1 PhyChem variants derived from 222 patients. MDS plot with average stress = 0.386385 after 224 iterations. R- and C-associated variants shown in red and blue points, respectively
RBFNN performance in R/C classification of Intra-host HVR1 PhyChem variantsa
| Dataset | CA | F1 measure | MCC | AUROC |
|---|---|---|---|---|
| Full train setb | 95.795% | 0.958 | 0.910 | 0.986 |
| Train set | 94.847%c | 0.948c | 0.890c | 0.979c |
| Test set | 84.145%d | 0.842d | 0.670d | 0.912d |
| Random-labeled train set | 59.038%e (±1.28) | 0.521e (±0.007) | −0.007e (±0.022) | 0.501e (±0.012) |
| Test set | 39.965%f (±1.948) | 0.280f (±0.070) | 0.003f (±0.145) | 0.385f (±0.144) |
aFor description of train/test data, see Methods Section
bValues obtained from RBFNN classifier trained on entire training dataset without CV
cOverall value represents averaged values of 10xCV data
dValue obtained from RBFNN classifier trained on training dataset by 10xCV
eOverall value represents averaged values of 10xCV data obtained from 4 datasets. Standard deviation (SD), in parenthesis
fOverall value represents averaged values obtained from 4 RBFNN classifiers trained on randomly-labeled data by 10xCV (SD)
Comparison of RBFNN performance on randomized datasets in 100 10xCV tests§
| Dataset | No. CV runs | CA | F1 measure | MCC | AUROC |
|---|---|---|---|---|---|
| Train set 1a | 1000 | 94.943% (±1.067) | 0.960 (±0.009) | 0.892 (±0.023) | 0.981 (±0.005) |
| Train set 2 | 1000 | 95.958% (±0.717) | 0.974(±0.005) | 0.887 (±0.020) | 0.986 (±0.003) |
| Train set 3 | 1000 | 96.014% (±0.719) | 0.974 (±0.005) | 0.889 (±0.020) | 0.987 (±0.003) |
| Train set 4 | 1000 | 95.981% (±0.699) | 0.974 (±0.005) | 0.889 (±0.019) | 0.986 (±0.003) |
§Comparisons are based on the corrected two-tailed T-test at a significance level of p < 0.001
aDataset used to train (fit) the RBFNN classifier (1st and 2nd rows in Table 2)