| Literature DB >> 30871461 |
Bohu Pan1, Rebecca Kusko2, Wenming Xiao1, Yuanting Zheng3, Zhichao Liu1, Chunlin Xiao4, Sugunadevi Sakkiah1, Wenjing Guo1, Ping Gong5, Chaoyang Zhang6, Weigong Ge1, Leming Shi3, Weida Tong1, Huixiao Hong7.
Abstract
BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed.Entities:
Keywords: Calling pipeline comparison; Human reference genomes; Next generation sequencing; SNV
Mesh:
Year: 2019 PMID: 30871461 PMCID: PMC6419332 DOI: 10.1186/s12859-019-2620-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Study design. Whole genome sequencing data from GIAB reference sample NA12878 was downloaded and aligned to human genomes HG19 and HG38 using three aligners followed by SNVs calling using various calling algorithms. The SNVs were then converted between the two reference genomes using Picard and CrossMap. To pinpoint discordant SNVs, converted SNVs were compared against SNVs identified by directly using the target reference genome version. Finally, discordant SNVs were characterized by read depth, low-confidence frequency and prevalence of G/C reference alleles
The 26 SNV calling pipelines
| Number | Aligner | GATK recalibration | Caller |
|---|---|---|---|
| 1 | Novoalign | N | FreeBayes |
| 2 | Novoalign | N | HC |
| 3 | ISAAC | N | FreeBayes |
| 4 | ISAAC | Y | FreeBayes |
| 5 | ISAAC | N | HC |
| 6 | ISAAC | Y | HC |
| 7 | ISAAC | N | ISAAC |
| 8 | ISAAC | Y | ISAAC |
| 9 | ISAAC | N | SAMtools |
| 10 | ISAAC | Y | SAMtools |
| 11 | BWA | N | FreeBayes |
| 12 | BWA | Y | FreeBayes |
| 13 | BWA | N | HC |
| 14 | BWA | Y | HC |
| 15 | BWA | N | ISAAC |
| 16 | BWA | Y | ISAAC |
| 17 | BWA | N | SAMtools |
| 18 | BWA | Y | SAMtools |
| 19 | Bowtie2 | N | FreeBayes |
| 20 | Bowtie2 | Y | FreeBayes |
| 21 | Bowtie2 | N | HC |
| 22 | Bowtie2 | Y | HC |
| 23 | Bowtie2 | N | ISAAC |
| 24 | Bowtie2 | Y | ISAAC |
| 25 | Bowtie2 | N | SAMtools |
| 26 | Bowtie2 | Y | SAMtools |
Alignment rates between genome versions and aligners
| HG19 | HG38 | |
|---|---|---|
| Aligners | Alignment Rate (%) | Alignment Rate (%) |
| Bowtie2 | 98.559 | 98.503 |
| BWA | 99.629 | 99.633 |
| ISAAC | 99.146 | 99.034 |
Fig. 2Distribution of genomic coverage. The coverage from each pipeline is plotted against frequency in a log scale with HG19 as red lines and HG38 as blue lines. The two sub-figures in each row are a specific aligner depicted by the titles above the sub-figures. The three sub-figures in the left panel are alignment results without GATK realignment while the right panel contains alignment results with GATK realignment
Fig. 3SNVs called from different pipelines. Numbers of SNVs (y-axis) is plotted as bar height. The x-axis contains pipeline numbers found in Table 1. The blue bars represent HG19 alignments and the red bars represent HG38 alignments
Fig. 4Conversion rates. The conversion rates obtained from Picard are plotted as open circles and the conversion rates yielded from CrossMap are filled diamonds. Results from converting HG38 to HG19 are in blue and results from converting HG19 to HG38 are in red. The x-axis contains pipeline numbers found in Table 1. The y-axis depicts the conversion rates
Fig. 5Depth distribution of the converted and not converted SNVs identified from BWA alignment. The number of SNVs (y-axis) is plotted against depth (x-axis) for SNVs called using FreeBayes (blue), HC (magenta), ISAAC (red), and SAMtools (cyan). The solid lines are conversion results from HG19 to HG38. The dotted lines are conversion results from HG38 to HG19. a Successfully converted SNVs using CrossMap. b SNVs which were not successfully converted using CrossMap. c Successfully converted SNVs using Picard. d SNVs which were not successfully converted using Picard
Fig. 6Discordant SNVs. a Rates of discordant SNVs in the successfully converted SNVs are portrayed on the y-axis. b Ratios of position discordant SNVs to genotype discordant SNVs are depicted on the y-axis. The results from Picard are open circles and the results from CrossMap are filled diamonds. Conversions from HG38 to HG19 are in blue and conversions from HG19 to HG38 are in red. The x-axis contains pipeline numbers from Table 1
Fig. 7Ratios of LC to HC discordant SNVs. a Log2 values of the ratios of position discordant SNVs are on the y-axis. b Log2 values of the ratios of genotype discordant SNVs on the y-axis. The results from Picard are open circles and the results from CrossMap are filled diamonds. Conversions from HG38 to HG19 are in blue and conversions from HG19 to HG38 are in red. Numbers along the x-axis come from the pipelines in Table 1
Base composition of discordant SNVs in percentages (mean ± standard deviation)
| Type | Base | Picard | CrossMap | ||
|---|---|---|---|---|---|
| HG38➔HG19 | HG19➔HG38 | HG38➔HG19 | HG19➔HG38 | ||
| Position discordant SNVs | A | 23.43 ± 0.78 | 23.74 ± 0.81 | 23.32 ± 0.81 | 23.66 ± 0.80 |
| T | 23.86 ± 0.67 | 24.02 ± 0.86 | 23.64 ± 0.70 | 23.89 ± 0.85 | |
| G | 26.21 ± 0.71 | 25.92 ± 0.8 | 26.37 ± 0.75 | 26.06 ± 0.78 | |
| C | 26.5 ± 0.81 | 26.32 ± 0.87 | 26.62 ± 0.81 | 26.35 ± 0.89 | |
| G + C | 52.71 ± 1.44 | 52.24 ± 1.64 | 52.99 ± 1.49 | 52.41 ± 1.64 | |
| Genotype discordant SNVs | A | 23.06 ± 1.61 | 23.07 ± 1.66 | 23.49 ± 0.88 | 23.57 ± 0.91 |
| T | 23.11 ± 1.66 | 23.07 ± 1.75 | 23.59 ± 0.81 | 23.67 ± 0.93 | |
| G | 26.74 ± 1.65 | 26.84 ± 1.92 | 26.79 ± 0.81 | 26.19 ± 1.16 | |
| C | 27.08 ± 1.47 | 27.02 ± 1.45 | 26.13 ± 0.92 | 26.57 ± 0.67 | |
| G + C | 53.82 ± 2.93 | 53.86 ± 3.13 | 52.92 ± 1.62 | 52.77 ± 1.73 | |