| Literature DB >> 25109789 |
Ikuko N Motoike, Mitsuyo Matsumoto, Inaho Danjoh, Fumiki Katsuoka, Kaname Kojima, Naoki Nariai, Yukuto Sato, Yumi Yamaguchi-Kabata, Shin Ito, Hisaaki Kudo, Ichiko Nishijima, Satoshi Nishikawa, Xiaoqing Pan, Rumiko Saito, Sakae Saito, Tomo Saito, Matsuyuki Shirota, Kaoru Tsuda, Junji Yokozawa, Kazuhiko Igarashi, Naoko Minegishi, Osamu Tanabe, Nobuo Fuse, Masao Nagasaki, Kengo Kinoshita, Jun Yasuda, Masayuki Yamamoto1.
Abstract
BACKGROUND: Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.Entities:
Mesh:
Year: 2014 PMID: 25109789 PMCID: PMC4138778 DOI: 10.1186/1471-2164-15-673
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Basic statistics of the whole genome sequences generated by the HiSeq sequencer
| Samples | Total bases (GB) | Average depth | Read 2 %Q30: 100–150 cycles | Aligned bases (GB) | Mapping ratio (%) | SNPs |
|---|---|---|---|---|---|---|
| 01 | 103 | 33.6 | 0.740 | 101 | 98 | 3,631,549 |
| 02 | 100 | 32.4 | 0.670 | 97.3 | 97 | 3,606,901 |
| 03 | 106 | 34.3 | 0.600 | 97.1 | 92 | 3,597,816 |
| 04 | 104 | 34.0 | 0.710 | 101 | 96 | 3,625,724 |
| 05 | 96 | 30.6 | 0.670 | 91.4 | 95 | 3,601,895 |
| 06 | 99 | 30.8 | 0.660 | 96.0 | 97 | 3,604,534 |
| 07 | 96 | 31.4 | 0.690 | 93.9 | 98 | 3,588,904 |
| 08 | 97 | 31.1 | 0.590 | 90.2 | 93 | 3,598,436 |
| 09 | 104 | 33.2 | 0.610 | 96.8 | 93 | 3,603,430 |
| 10 | 96 | 31.5 | 0.740 | 92.6 | 97 | 3,601,931 |
| 11 | 106 | 34.4 | 0.740 | 104 | 98 | 3,616,799 |
| 12 | 100 | 31.3 | 0.520 | 94.1 | 94 | 3,569,104 |
| Mean | 101 | 32.4 | 0.662 | 96.3 | 95.7 | 3603919 |
| %CV | 3.7 | 4.3 | 10 | 4.2 | 2.2 | 0.43 |
Basic statistics of whole exome sequences generated by the Ion Proton sequencer
| Samples | Total bases (GB) | Average depth | Average read length (bp) | Aligned bases (GB) | Aligned bases (%) | SNPs |
|---|---|---|---|---|---|---|
| 01_1 | 6.77 | 65.9 | 121 | 6.40 | 96 | 58037 |
| 01_2 | 8.37 | 85.1 | 127 | 7.90 | 95 | |
| 02 | 10.8 | 122 | 142 | 10.4 | 98 | 69667 |
| 03_1 | 8.24 | 78.5 | 144 | 8.00 | 97 | 67493 |
| 03_2 | 8.93 | 93.0 | 139 | 8.60 | 97 | |
| 04 | 11.2 | 126 | 149 | 10.8 | 97 | 53811 |
| 05 | 12.2 | 147 | 148 | 11.9 | 98 | 46923 |
| 06 | 9.25 | 94.5 | 149 | 9.00 | 98 | 52516 |
| 07 | 12.1 | 135 | 149 | 11.8 | 98 | 53993 |
| 08 | 10.7 | 103 | 138 | 10.3 | 96 | 52243 |
| 09 | 10.9 | 121 | 137 | 10.5 | 97 | 50166 |
| 10 | 11.6 | 136 | 142 | 11.3 | 97 | 50781 |
| 11 | 10.4 | 85.5 | 134 | 10.0 | 97 | 67810 |
| 12 | 11.3 | 133 | 146 | 10.9 | 97 | 54335 |
| Mean | 10.2 | 109 | 140 | 9.84 | 97.0 | 56481.25 |
| %CV | 15 | 23 | 5.9 | 16 | 0.87 | 12.94 |
Two independent runs (indicated as _1 and 2) were performed for Samples 01 and 03.
Concordant SNP calls made from the HiSeq, Ion Proton, and Omni 2.5-8 SNP array data
| Samples | Total calls | Omni 2.5 | HiSeq | Proton | All concordant | HO concordant | HP concordant | OP concordant | Proton support |
|---|---|---|---|---|---|---|---|---|---|
| 01 | 17031 | 16902 | 16782 | 15678 | 15549 | 16733 | 15576 | 15571 | 0.929 |
| 02 | 17168 | 17007 | 16872 | 14033 | 13879 | 16820 | 13896 | 13907 | 0.825 |
| 03 | 17156 | 17009 | 16860 | 15025 | 14855 | 16806 | 14889 | 14898 | 0.884 |
| 04 | 16964 | 16847 | 16712 | 15668 | 15552 | 16663 | 15580 | 15572 | 0.933 |
| 05 | 17029 | 16930 | 16782 | 16326 | 16208 | 16732 | 16238 | 16247 | 0.969 |
| 06 | 17151 | 17033 | 16873 | 15863 | 15723 | 16832 | 15744 | 15765 | 0.934 |
| 07 | 17193 | 17058 | 16918 | 16060 | 15905 | 16862 | 15939 | 15947 | 0.943 |
| 08 | 17223 | 17099 | 16939 | 16095 | 15935 | 16892 | 15965 | 15988 | 0.943 |
| 09 | 17282 | 17173 | 17017 | 16220 | 16084 | 16965 | 16114 | 16133 | 0.948 |
| 10 | 17030 | 16922 | 16800 | 16180 | 16067 | 16747 | 16098 | 16094 | 0.959 |
| 11 | 17086 | 16944 | 16802 | 14332 | 14182 | 16768 | 14199 | 14207 | 0.846 |
| 12 | 17234 | 17100 | 16938 | 16163 | 15996 | 16889 | 16028 | 16046 | 0.947 |
| Average | 17129 | 17002 | 16858 | 15637 | 15495 | 16809 | 15522 | 15531 | 0.922 |
| %CV | 0.56 | 0.54 | 0.49 | 4.7 | 4.8 | 0.49 | 4.8 | 4.8 | 4.7 |
HO concordant, HiSeq 2500 and Omni 2.5-8 chip SNP calls were concordant but not the Ion Proton calls. HP concordant, HiSeq 2500 and Ion Proton SNP calls were concordant but not the Omni 2.5-8 chip calls. OP concordant, Omni 2.5-8 chip and Ion Proton SNP calls were concordant but not the HiSeq 2500 calls. Calculation of proton support = All concordant/HO concordant.
Figure 1Venn diagram of SNP calls for each sample shared among the three platforms. The numbers in each area indicate the number of SNPs. The size of the area is not proportional.
Figure 2Level plots of alternative allele counts among the three platforms. Scatter plots of alternative allele counts between A) HiSeq and Omni 2.5-8, B) HiSeq and Proton, and C) Proton and Omni 2.5-8 are indicated. The vertical and horizontal axes show the numbers of alternative allele calls for each platform. The color bar indicates the number of SNPs in a pixel. There were 12 individual samples, so the maximum number of alternative alleles is 24. The digits with arrows near the corner indicate the numbers of SNPs at the corner pixels.
Average numbers of NGS SNP calls showing concordant or discordant with Omni 2.5-8
| Proton concordant & HiSeq discordant | HiSeq concordant & Proton discordant | All concordant | Both discordant | |||
|---|---|---|---|---|---|---|
| Total | HiSeq = Proton | All discordant | ||||
| Average | 64 | 1849 | 76945 | 216 | 202 | 14 |
| %CV | 19.6 | 55.6 | 1.24 | 4.23 | 5.51 | 31.6 |
| Common | 1 | 43 | 66472 | 80 | 67 | 1 |
Figure 3Box plots of read depths at concordant/discordant SNP loci among the NGSs and Omni 2.5-8. A) Box plots of the NGS read depth at the SNP loci showing concordance/discordance between the NGSs and Omni 2.5-8. The box indicates the first and third quartiles and the lines indicate the highest and lowest value that is within 1.5 x inter-quartile range. Outliers were omitted. The graph on the left indicates the distribution of read depths at HiSeq2500 SNP loci and the graph on the right indicates the distribution of read depths at Ion Proton and Omni 2.5-8 SNP loci. B) Box plots of the NGS read depths at the SNP loci showing concordance/discordance among the three platforms. The box indicates the first and third quartiles and the lines indicate the highest and lowest value that is within 1.5 x inter-quartile range. Outliers were omitted. The three columns (shaded gray rectangle) from right indicate the distribution of read depths of the discordant SNPs between the NGSs and Omni 2.5-8. The read depth at SNP loci that showed Ion Proton discordance against HiSeq 2500 and Omni 2.5-8 was significantly different. The read depths at SNP loci for SNPs that both HiSeq 2500 and Ion Proton failed to call are shown in the gray box.
Figure 4Box plots of GC content at concordant/discordant SNP loci among the NGSs and Omni 2.5-8. A) Box plots of the GC content at the SNP loci showing concordance/discordance between the NGSs and Omni 2.5-8. The box indicates the first and third quartiles and the lines indicate the highest and lowest value that is within 1.5 x inter-quartile range. The graph on the left indicates the distribution of GC content at HiSeq SNP loci. There was no statistically significance difference in GC content between the concordant and discordant SNPs (p = 0.81). The graph on the right indicates the distribution of GC content at Ion Proton and Omni 2.5-8 SNP loci. The difference in GC content between concordant and discordant SNPs was significant (p = 2.2 × 10-16). B) Box plots of the GC content at the SNP loci showing concordance/discordance among the three platforms. The three columns (shaded gray rectangle) from right indicate the distribution of GC content of the discordant SNPs between the NGSs and Omni 2.5-8. The GC content at SNP loci that showed Ion Proton discordance compared with HiSeq and Omni 2.5-8 was significantly different (p = 0.026).
Figure 5Distribution of homopolymer lengths at concordant/discordant SNP loci among the three platforms. The vertical axis of each panel indicates the density of each homopolymer at the SNP loci, calculating the total number of SNPs for each category as 1. The horizontal axis of each panel indicates the length of the homopolymers.
Logistic regression analysis among QC factors of NGS and calling discordance with Omni 2.5-8
| Platform | A | B | C | A + B | A + C | A + B + C |
|---|---|---|---|---|---|---|
| Read depth | GC contents | Homopolymer | ||||
| HiSeq 2500 | 0.0854 | 0.0021 | 0.000074 | 0.0896 | 0.104 | 0.106 |
| Ion Proton | 0.0759 | 0.0051 | 0.0123 | 0.0837 | 0.0759 | 0.0839 |