| Literature DB >> 26208819 |
Wen-Chun Liu1,2, Chih-Peng Lin3, Chun-Pei Cheng4, Cheng-Hsun Ho5,6, Kuo-Lun Lan7, Ji-Hong Cheng8, Chia-Jui Yen9,10, Pin-Nan Cheng11, I-Chin Wu12,13, I-Chen Li14, Bill Chia-Han Chang15, Vincent S Tseng16, Yen-Cheng Chiu17,18, Ting-Tsung Chang19,20.
Abstract
BACKGROUND: Hepatitis B virus (HBV) quasispecies are crucial in the pathogenesis of chronic liver disease. Next-generation sequencing (NGS) is powerful for identifying viral quasispecies. To improve mapping quality and single nucleotide variant (SNV) calling accuracy in the NGS analysis of HBV, we compared different mapping references, including the sample-specific reference sequence, same genotype sequences and different genotype sequences, according to the sample.Entities:
Keywords: Alignment stage; Coverage; Divergence; Single nucleotide variants
Mesh:
Substances:
Year: 2015 PMID: 26208819 PMCID: PMC4722079 DOI: 10.1007/s12072-015-9645-x
Source DB: PubMed Journal: Hepatol Int ISSN: 1936-0533 Impact factor: 6.047
Characteristics of patients with different HBV genotypes
| Variables | Genotype B | Genotype C |
|
|---|---|---|---|
| Gender (m:f) | 27:7 | 40:12 | 1.00 |
| Age (years) | 51.4 ± 9.5 | 51.9 ± 8.8 | 0.80 |
| Albumin (g/dL) | 4.3 ± 0.4 | 4.2 ± 0.4 | 0.19 |
| AST (IU/L) | 116.5 ± 131.2 | 172.4 ± 174.4 | 0.11 |
| ALT (IU/L) | 177.8 ± 186.8 | 214.9 ± 207.5 | 0.39 |
| Creatinine (mg/dL) | 0.9 ± 0.2 | 0.9 ± 0.2 | 0.53 |
| Total bilirubin (mg/dL) | 1.6 ± 4.0 | 1.1 ± 0.8 | 0.39 |
| HBeAg (+/−) | 5/29 | 14/38 | 0.28 |
| HBV DNA (log10 IU/mL) | 6.5 ± 1.4 | 6.6 ± 1.5 | 0.76 |
| Cirrhosis (+/−) | 7/27 | 22/30 | 0.06 |
Data of continuous variables are mean ± SD. p values for continuous variables and nominal variables are from two-tailed independent t tests and χ 2 tests, respectively
Mapping NGS datasets of genotype C patients (n = 52) to different references of HBV full genome
| Variables | FJ787477 | KJ790200 | JN315779 | KJ790199 | Sample specific reference |
|---|---|---|---|---|---|
| Mappable reads (%) | 86.07 ± 9.13*** | 86.96 ± 8.94*** | 89.08 ± 9.06*** | 89.30 ± 8.95***a | 89.41 ± 8.97 |
| Properly paired reads (%) | 83.28 ± 9.07*** | 84.41 ± 8.89*** | 86.65 ± 9.26 | 86.81 ± 9.13 | 86.89 ± 9.16 |
| Broken paired reads (%) | 0.99 ± 0.91** | 0.99 ± 0.89** | 1.50 ± 1.27 | 1.51 ± 1.28 | 1.75 ± 1.88 |
| Singleton (%) | 1.80 ± 0.76*** | 1.56 ± 0.67*** | 0.93 ± 0.59*** | 0.98 ± 0.58*** | 0.77 ± 0.56 |
| Minimum coverage per nucleotide | 93 ± 523*** | 277 ± 782*** | 1439 ± 2693 | 1471 ± 2698 | 1521 ± 2965 |
| Maximum coverage per nucleotide | 190,649 ± 454,110 | 192,928 ± 461,547 | 195,836 ± 475,955 | 197,166 ± 476,190 | 197,173 ± 475,579 |
| Average coverage per nucleotide | 33,833 ± 64,303** | 34,214.4 ± 65,765** | 38,287 ± 75,400 | 38,335 ± 75,419a | 38 362.4 ± 75,502 |
| Nucleotides covered <30 (%) | 0.85 ± 1.16*** | 0.35 ± 1.10** | 0.23 ± 1.12 | 0.24 ± 1.15 | 0.22 ± 1.07 |
| Nucleotides covered >1000 (%) | 93.69 ± 5.23*** | 94.85 ± 4.99*** | 97.15 ± 4.40*** | 97.17 ± 4.40 | 97.19 ± 4.40 |
Total reads after quality trimming = 1,504,374 ± 2,780,326; data are mean ± SD; FJ787477, KJ790200, JN315779, and KJ790199 were from GenBank database; Geno. genotype; sample-specific reference was from the NGS reads aligned to JN315779; p values for differences between samples-specific reference and each reference from the GenBank database (*p < 0.05; **p < 0.01; ***p < 0.001) and for differences between JN315779 and KJ790199 (a p < 0.01) are from two-tailed independent t tests
Fig. 1A comparison of different mapping reference sequences and their derived consensus sequences of NGS reads from Clone_N6 (Genotype C) with the direct sequence of Clone_N6. Asian genotypes B (GenBank accession number FJ787477) and C (GenBank accession number JN315779) from the NCBI GenBank database. The sample-specific reference was a consensus sequence obtained from Clone_N6 NGS reads aligned to JN315779. The derived consensus sequences were obtained from Clone_N6 NGS reads with alignment against their preceding mapping references, respectively. The thick lines indicate false SNVs in the derived consensus sequence
Coverage and percentage of false SNVs of NGS reads (Clone_N6, genotype C) with alignment to different references of HBV full genome
| NT | N6 sequence/false SNVs | Sample-specific reference | JN315779 (Geno. C, Asia)a | FJ787477 (Geno. B, Asia)a |
|---|---|---|---|---|
| 929 | T/A | 586,307/229 (99.8/0.1) | 17,271/456 (97.3/2.6) | 116/449 (20.5/79.5) |
| 930 | C/A | 578,003/378 (99.8/0.1) | 17,200/425 (97.4/2.4) | 62/402 (13.2/85.4) |
| 934 | C/A | 542,825/281 (99.9/0.1) | 17,234/395 (97.7/2.2) | 75/367 (17.0/83.0) |
| 939 | G/A | 505,379/307 (99.9/0.1) | 17,548/59 (99.7/0.0) | 10/407 (2.4/97.6) |
| 940 | C/A | 498,076/149 (99.9/0.0) | 17,530/76 (99.6/0.0) | 12/402 (2.9/97.1) |
| 941 | A/T | 493,990/49 (99.9/0.0) | 11,199/325 (97.2/0.1) | 12/300 (3.8/95.9) |
| 942 | A/G | 489,435/248 (99.9/0.1) | 15,823/424 (97.4/0.1) | 23/304 (7.0/93.0) |
| 2733 | A/C | 729,923/160 (99.9/0.0) | 61,679/193 (99.7/0.3) | 41/115 (26.3/73.7) |
| 2735 | A/G | 720,464/432 (99.9/0.1) | 49,152/393 (99.2/0.8) | 25/121 (17.1/82.9) |
| 2738 | C/G | 700,914/218 (99.9/0.0) | 49,206/200 (99.5/0.4) | 25/103 (18.7/76.9) |
| 2741 | G/A | 683,135/889 (99.9/0.1) | 48,724/789 (98.4/1.6) | 29/124 (19.0/81.1) |
| 2980 | T/C | 739,842/2381 (99.7/0.3) | 421,040/2437 (99.4/0.6) | 1308/2207 (37.2/62.8) |
| 2988 | C/G | 756,854/793 (99.8/0.1) | 33,655/1712 (95.1/4.8) | 884/1670 (34.5/65.3) |
| 2989 | A/C | 757,643/834 (99.8/0.1) | 33,597/1762 (95.0/5.0) | 844/1718 (32.9/67.0) |
| 2997 | T/C | 752,699/1392 (99.8/0.2) | 32,840/2108 (94.0/6.0) | 415/1810 (18.7/81.4) |
| 2998 | C/A | 754,680/846 (99.8/0.1) | 33,348/1753 (95.0/5.0) | 758/1357 (35.8/64.1) |
| 3006 | A/G | 730,064/1055 (99.8/0.1) | 32,595/1863 (94.6/5.4) | 25/1416 (1.7/98.1) |
| 3009 | G/C | 718,234/667 (99.9/0.1) | 32,503/1615 (95.1/4.7) | 1/1296 (0.1/99.8) |
| 3011 | A/C | 708,480/967 (99.8/0.1) | 32,393/1688 (95.0/4.9) | 0/1266 (0.0/100.0) |
| 3012 | A/C | 705,693/646 (99.9/0.1) | 32,481/1596 (95.3/4.7) | 0/941 (0.0/99.8) |
| 3015 | T/C | 691,698/806 (99.9/0.1) | 32,366/1695 (95.0/5.0) | 1/946 (0.1/96.4) |
Data are coverage (%); sample-specific reference was from the NGS reads with alignment to JN315779
NT nucleotide, SNV single nucleotide variation, Geno. genotype
aFJ787477 and JN315779 were from the GenBank database
Fig. 2a Sequence divergence between Clone_N6 (genotype C) and mapping references, FJ787477 (genotype B, Asia) and JN315779 (genotype C, Asia), respectively. Arrows indicate three regions with the highest divergence over 18 % at nt929–942, nt2733–2741, and nt2980–3015. b Mean sequence divergence between derived consensus sequences from NGS reads of 52 patients with genotype C chronic hepatitis B and mapping references, FJ787477 (genotype B, Asia) and JN315779 (genotype C, Asia). A schematic diagram of the hepatitis B virus complete genome and four genes are shown in the bottom panel and the positions correspond to the x-axis of (a, b)
Prevalence of probable false SNVs of genotype C patients (n = 52) to different references of HBV full genome
| NT | Mapping reference (Geno. C/B) | Inconsistent SNVs comprising derived consensus sequences | Number of patients (%) |
|---|---|---|---|
| JN315779 → FJ787477 (Geno. C, Asia) (Geno. B, Asia) | |||
| 939 | G/A | G → A | 38 (73.1) |
| 940 | C/A | C → A | 38 (73.1) |
| 941 | A/T | A → T | 27 (51.9) |
| 942 | A/G | A → G | 23 (44.2) |
| 1353 | T/C | T → C | 15 (28.8) |
| 1356 | G/C | G → C | 15 (28.8) |
| 1359 | A/G | A → G | 13 (25.0) |
| 1362 | C/T | C → T | 12 (23.1) |
| 2980 | T/C | T → C | 17 (32.7) |
| 2988 | C/G | C → G | 18 (34.6) |
| 2989 | A/C | A → C | 20 (38.5) |
| 2997 | T/C | T → C | 25 (48.1) |
| 2998 | C/C | C → A | 22 (42.3) |
| 3006 | A/G | A → G | 26 (50.0) |
| 3009 | G/C | G(29)/A(2) → C | 31 (59.6) |
| 3012 | A/T | A → C | 31 (59.6) |
| 3015 | T/G | T → C(23)/G(17) | 40 (76.9) |
Inconsistent SNVs comprising derived consensus sequences from at least 20 % of patients are shown
SNV single nucleotide variation, NT nucleotide, Geno. genotype
Fig. 3Comparison of mean sequence divergences between different hepatitis B virus (HBV) genotype populations. A total of 158 HBV strains were collected from the GenBank database (34 genotype A, 33 genotype B, 39 genotype C, and 52 genotype D) and analyzed. a Mean sequence divergences within the same genotype were expressed. b, c Mean sequence divergences between different genotypes were expressed. A schematic diagram of the hepatitis B virus complete genome and four genes are shown in the bottom panel and the positions correspond to the x-axis of (a–c). Geno. A genotype A, Geno. B genotype B, Geno. C genotype C, Geno. D genotype D