| Literature DB >> 26457193 |
Agnese Viluma1, Shumaila Sayyab1, Sofia Mikko1, Göran Andersson1, Tomas F Bergström1.
Abstract
BACKGROUND: Next generation sequencing (NGS) has traditionally been performed by large genome centers, but in recent years, the costs for whole-genome sequencing (WGS) have decreased substantially. With the introduction of smaller and less expensive "desktop" systems, NGS is now moving into the general laboratory. To evaluate the Ion Proton system for WGS we sequenced four Chinese Crested dogs and analyzed the data quality in terms of genome and exome coverage, the number of detected single nucleotide variants (SNVs) and insertions and deletions (INDELs) and the genotype concordance with the Illumina HD canine SNP array. For each of the four dogs, a 200 bp fragment library was constructed from genomic DNA and sequenced on two Ion PI chips per dog to reach mean coverage of 6-8x of the canine genome (genome size ≈ 2.4 Gb).Entities:
Keywords: Dog genome; Ion Proton; Next-generation sequencing; Variant detection; Whole-genome sequencing
Year: 2015 PMID: 26457193 PMCID: PMC4599337 DOI: 10.1186/s40575-015-0029-2
Source DB: PubMed Journal: Canine Genet Epidemiol ISSN: 2052-6687
Fig. 1Cumulative base coverage distribution. The cumulative read depth analysis of raw binary alignment (BAM) files. Each dog sample sequenced from one library and two Ion Proton PI chips (9.5 Gb). The x-axis represents the minimum read depth per base and the y-axis the percentage of genome (left panel) and exome (right panel) that is covered
Fig. 2GC bias and the normalized coverage. Read coverage over the canine genome with respect to GC content, calculated in 100 bp windows (in red color) and in each window fraction of normalized coverage (in blue color) and plotted against the left y-axis. Mean base quality at GC % (green) is calculated and plotted against the right y-axis
Number of detected variants across different variant calling tools
| Combined analysis | Individual analysisa | ||||
|---|---|---|---|---|---|
| SAMtools | UG | SAMtools | UG | HC | |
| Nr of SNVs (Ti/Tvb) | Nr of SNVs (Ti/Tvb) | Nr of SNVs (Ti/Tvb) | Nr of SNVs (Ti/Tvb) | Nr of SNVs (Ti/Tvb) | |
| Total | 5 165 528 (2.03) | 4 802 404 (1.93) | 3 065 136 (2.12) | 2 650 589 (2.04) | 2 410 162 (2.18) |
| Filtered | 4 255 671 (2.19) | 4 471 459 (2.01) | 2 280 929 (2.22) | 2 525 133 (2.09) | 2 363 010 (2.19) |
| Knowna | 1 423 628 (2.39) | 1 374 703 (2.40) | 860 320 (2.44) | 896 616 (2.42) | 873 093 (2.44) |
| Novel | 2 832 043 (2.10) | 3 096 756 (1.87) | 1 420 609 (2.11) | 1 628 517 (1.94) | 1 489 918 (2.06) |
| Nr of INDELs | Nr of INDELs | Nr of INDELs | Nr of INDELs | Nr of INDELs | |
| Total | 11 750 679 | 3 539 988 | 3 778 222 | 341 366 | 1 295 497 |
| Filtered | 5 635 914 | 2 764 772 | 1 157 392 | 334 763 | 644 610 |
| Knownc | 4 188 | 4 129 | 1 493 | 1 001 | 1 422 |
| Novel | 5 631 726 | 2 760 643 | 1 155 899 | 333 761 | 643 188 |
aAverage result from four individuals; bTransition-Transversion ratio; cKnown variants in dog [16], UG UnifiedGenotyper tool, HC HaplotypeCaller tool
Fig. 3Detected variant overlap with already known genetic variation in dogs for SNVs and INDELs. Comparison of two different variant calling tools (SAMtools and UnifiedGenotyper) showing overlap between detected (a) SNVs and (b) INDELs (yellow circle for SAMtools; blue circle for UnifiedGenotyper) with already known variants in dogs (green circle). The proportion of a particular overlap category is shown in percentage of the total unique SNV (4.83 million) or INDEL (6.10 million) number detected by both tools. Non-overlapping parts of both tools represent variants detected only with a particular tool
Fig. 4Concordance with Illumina HD canine SNP array. a Left pie-chart: Concordance between genotypes from two individual dogs detected by the 170 k Illumina HD canine SNP array (CanineHD BeadChip) and called variants from NGS data using UnifiedGenotyper. 90.58 % of the genotypes were concordant, 7.30 % were discordant, 0.69 % were only called by UnifiedGenotyper, 1.30 % were called only by the SNP array and 0.12 % failed with both UnifiedGenotyper and SNP array. b Right pie-chart: Distribution of discordant genotypes based on the type of discordance. The reference allele is coded with 0 and alternative allele with 1. Thus 60 % of discordant genotypes were called homozygous for reference allele with UnifiedGenotyper and heterozygous with the SNP array (0/0:0/1); 17 % (0/1:1/1 or 0/0); 13 % (0/0:1/1); and 10 % (1/1:0/1)
Fig. 5Proportion of genome- and exome-wide coverage by merging reads from different libraries. Left panel: The proportion of the genome with zero coverage (gray color) or with 1–3x coverage (orange color) when one library was compared with merging the reads from two, three or four sequencing libraries. With one library sequenced with two Ion Proton PI chips 3.43 % of the genome was not covered (0 coverage) and 13.72 % had coverage of 1–3x (considered as low coverage). With merging of the reads from four libraries 1.06 % had zero and 1.75 % had 1–3x coverage. Right panel: Similarly, the proportion of the exome with zero coverage (gray color) or with 1–3x coverage (orange color) is shown
Characterization of regions not covered after merging all four libraries
| Features | Genome-wide | Exome-wide |
|---|---|---|
| Bases not covered | 24 811 567 | 1 384 635 |
| N bases in reference sequence (gaps) | 10 040 013 | 42 019 |
| Total not covered bases excluding gaps | 14 771 554 (0.63 % of genome) | 1 342 616 (2.57 % of exome) |
| Bases not covered in repeat regions | 4 912 354 | 983 511 |
| Bases not covered in CpG Islands | 4 375 887 | 253 643 |