| Literature DB >> 30581450 |
M Michelle Malmberg1,2, Denise M Barbulescu3, Michelle C Drayton1, Maiko Shinozuka1, Preeti Thakur1, Yvonne O Ogaji1, German C Spangenberg1,2, Hans D Daetwyler1,2, Noel O I Cogan1,2.
Abstract
Whole genome sequencing offers genome wide, unbiased markers, and inexpensive library preparation. With the cost of sequencing decreasing rapidly, many plant genomes of modest size are amenable to skim whole genome resequencing (skim WGR). The use of skim WGR in diverse sample sets without the use of imputation was evaluated in silico in 149 canola samples representative of global diversity. Fastq files with an average of 10x coverage of the reference genome were used to generate skim samples representing 0.25x, 0.5x, 1x, 2x, 3x, 4x, and 5x sequencing coverage. Applying a pre-defined list of SNPs versus de novo SNP discovery was evaluated. As skim WGR is expected to result in some degree of insufficient allele sampling, all skim coverage levels were filtered at a range of minimum read depths from a relaxed minimum read depth of 2 to a stringent read depth of 5, resulting in 28 list-based SNP sets. As a broad recommendation, genotyping pre-defined SNPs between 1x and 2x coverage with relatively stringent depth filtering is appropriate for a diverse sample set of canola due to a balance between marker number, sufficient accuracy, and sequencing cost, but depends on the intended application. This was experimentally examined in two sample sets with different genetic backgrounds: 1x coverage of 1,590 individuals from 84 Australian spring type four-parent crosses aimed at maximizing diversity as well as one commercial F1 hybrid, and 2x coverage of 379 doubled haploids (DHs) derived from a subset of the four-parent crosses. To determine optimal coverage in a simpler genetic background, the DH sample sequence coverage was further down sampled in silico. The flexible and cost-effective nature of the protocol makes it highly applicable across a range of species and purposes.Entities:
Keywords: Brassica napus; GBS; doubled haploid; low coverage; plant
Year: 2018 PMID: 30581450 PMCID: PMC6292936 DOI: 10.3389/fpls.2018.01809
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
FIGURE 1The number of high-confidence SNPs in the global diversity panel for each skim coverage level (0.25x, 0.5x, 1x–5x) and minimum read depth (dp 5, dp 4, dp 3, and dp 2).
FIGURE 2Relative distribution of high-confidence SNPs across chromosomes in the global diversity panel skim SNP sets filtered to a minimum read depth of (A) dp 5, (B) dp 4, (C) dp 3, and (D) dp 2. SNP density between chromosomes within each SNP set (skim coverage level and depth filter combination) was compared, with the highest SNP density assigned a value of 1 and all other chromosomes assigned a value relative to this.
FIGURE 3Accuracy of genotype calls in the (A) in silico skim global diversity panel compared to the original 10x sequencing data and (B) DH skim data sets compared to the original 2x DH sequencing data. The percentage of genotypes across the whole genotype matrix (each SNP in each individual) within each skim SNP set is represented. Green bars represent genotype calls which are consistent with the corresponding full sequencing data, orange bars represent genotype calls which do not match, and the gray/black bars indicate missing genotypes in the (A) in silico skim global diversity panel and (B) genotypes present in the DH skim data but which could not be evaluated for accuracy due to insufficient read depth in the original 2x data.
Cumulative number of SNPs (% accuracy of called genotypes) in 10% increments of maximum missing data accepted per SNP, for each sequencing coverage level and minimum depth filter, in the skim global diversity panel.
| Maximum missing data | ||||||
|---|---|---|---|---|---|---|
| Skim level | Minimum read depth | <10% | <20% | <30% | <40% | <50% |
| 0.25x | 5 | 39 (92.1) | 54 (91.9) | 65 (92.1) | 84 (91.9) | 127 (92.2) |
| 4 | 42 (93.2) | 63 (92.5) | 76 (92.5) | 129 (92.2) | 206 (91.9) | |
| 3 | 50 (92.9) | 79 (91.9) | 132 (91.4) | 264 (91.1) | 529 (90.8) | |
| 2 | 70 (91.6) | 180 (90.6) | 496 (89.4) | 1,326 (88.3) | 3,137 (87.4) | |
| 0.5x | 5 | 62 (93.2) | 104 (92.9) | 188 (92.9) | 343 (92.6) | 591 (92.5) |
| 4 | 79 (92.5) | 165 (92.4) | 346 (92.2) | 655 (91.8) | 1,263 (91.6) | |
| 3 | 128 (92.7) | 327 (92.3) | 749 (91.6) | 1,629 (91.1) | 3,293 (90.9) | |
| 2 | 288 (91.8) | 1,161 (90.3) | 3,217 (89.5) | 7,655 (89.0) | 17,533 (89.0) | |
| 1x | 5 | 229 (92.8) | 559 (93.0) | 1,180 (93.2) | 2,308 (93.3) | 4,165 (93.5) |
| 4 | 347 (92.9) | 1,039 (93.0) | 2,415 (93.1) | 4,992 (93.2) | 10,268 (93.6) | |
| 3 | 617 (92.9) | 2,375 (92.9) | 5,834 (93.1) | 14,012 (93.3) | 33,158 (93.6) | |
| 2 | 1,968 (92.4) | 8,868 (92.3) | 26,345 (92.5) | 73,089 (92.5) | 181,106 (92.5) | |
| 2x | 5 | 1,511 (94.5) | 4,900 (95.2) | 11,761 (95.8) | 27,759 (96.3) | 60,939 (96.6) |
| 4 | 2,834 (94.8) | 10,654 (95.5) | 29,677 (96.1) | 74,253 (96.3) | 165,322 (96.4) | |
| 3 | 6,193 (95.2) | 29,664 (95.8) | 88,594 (95.8) | 219,662 (95.7) | 459,008 (95.6) | |
| 2 | 22,865 (95.3) | 121,324 (95.0) | 332,147 (94.7) | 697,586 (94.4) | 1,193,368 (94.2) | |
| 3x | 5 | 6,191 (96.0) | 24,883 (96.9) | 65,784 (97.3) | 149,552 (97.4) | 298,103 (97.5) |
| 4 | 12,760 (96.5) | 58,138 (97.0) | 157,952 (97.1) | 348,461 (97.1) | 655,669 (97.0) | |
| 3 | 32,584 (96.7) | 155,568 (96.7) | 399,127 (96.5) | 796,368 (96.3) | 1,333,089 (96.2) | |
| 2 | 117,715 (96.3) | 480,069 (95.8) | 996,611 (95.4) | 1,616,262 (95.2) | 2,229,034 (95.1) | |
| 4x | 5 | 19,832 (97.2) | 83,871 (97.7) | 209,245 (97.8) | 423,721 (97.8) | 741,998 (97.8) |
| 4 | 42,843 (97.5) | 184,749 (97.6) | 439,851 (97.6) | 832,678 (97.5) | 1,349,178 (97.4) | |
| 3 | 106,892 (97.4) | 427,994 (97.1) | 914,746 (96.9) | 1,539,505 (96.8) | 2,223,904 (96.7) | |
| 2 | 323,132 (96.8) | 1,004,739 (96.3) | 1,719,151 (96.1) | 2,397,603 (95.9) | 2,929,725 (95.8) | |
| 5x | 5 | 51,017 (97.9) | 200,955 (98.1) | 451,492 (98.1) | 817,461 (98.1) | 1,286,027 (98.1) |
| 4 | 106,435 (97.9) | 401,047 (97.9) | 836,977 (97.9) | 1,400,574 (97.8) | 2,038,444 (97.8) | |
| 3 | 243,287 (97.7) | 805,789 (97.5) | 1,488,739 (97.3) | 2,227,176 (97.2) | 2,935,573 (97.1) | |
| 2 | 618,121 (97.2) | 1,545,888 (96.8) | 2,329,718 (96.6) | 2,959,233 (96.5) | 3,305,828 (96.5) | |
Number of filtered SNPs generated through de novo SNP discovery in the skim global diversity panel.
| Skim level | Minimum read depth | No. of SNPs in | % of list-based SNPs identified by | Fold increase in SNP no. | Increase in SNP no. |
|---|---|---|---|---|---|
| 0.25x | 5 | 5,591 | 100 | 44.0 | 5,464 |
| 4 | 7,573 | 100 | 36.8 | 7,367 | |
| 3 | 10,310 | 100 | 19.5 | 9,781 | |
| 2 | 16,674 | 99.94 | 5.3 | 13,537 | |
| 0.5x | 5 | 11,224 | 100 | 19.0 | 10,633 |
| 4 | 14,659 | 99.92 | 11.6 | 13,396 | |
| 3 | 20,245 | 99.97 | 6.1 | 16,952 | |
| 2 | 39,520 | 99.99 | 2.3 | 21,987 | |
| 1x | 5 | 23,157 | 99.95 | 5.6 | 18,992 |
| 4 | 33,985 | 99.98 | 3.3 | 23,717 | |
| 3 | 63,924 | 99.99 | 1.9 | 30,766 | |
| 2 | 225,924 | 99.998 | 1.2 | 44,818 | |
| 2x | 5 | 100,028 | 99.98 | 1.6 | 39,089 |
| 4 | 220,638 | 99.99 | 1.3 | 55,316 | |
| 3 | 541,071 | 99.997 | 1.2 | 82,063 | |
| 2 | 1,309,224 | 99.999 | 1.1 | 115,856 | |
| 3x | 5 | 354,623 | 97.28 | 1.2 | 56,520 |
| 4 | 732,973 | 97.72 | 1.1 | 77,304 | |
| 3 | 1,429,815 | 97.81 | 1.1 | 96,726 | |
| 2 | 2,354,165 | 98.34 | 1.1 | 125,131 | |
| 4x | 5 | 825,813 | 97.59 | 1.1 | 83,815 |
| 4 | 1,456,258 | 97.83 | 1.1 | 107,080 | |
| 3 | 2,348,359 | 97.92 | 1.1 | 124,455 | |
| 2 | 3,121,869 | 98.50 | 1.1 | 192,144 | |
| 5x | 5 | 1,394,073 | 97.65 | 1.1 | 108,046 |
| 4 | 2,167,536 | 97.80 | 1.1 | 129,092 | |
| 3 | 3,080,766 | 97.9 | 1.0 | 145,193 | |
| 2 | 3,650,078 | 98.59 | 1.1 | 344,248 |
SNPs remaining from the 4M SNP list, in the four parent crosses after filtering for MAF of 0.01 and maximum missing data of 0.5 for each minimum read depth of 2, 3, 4, and 5.
| Minimum read depth | No. of SNPs | % residual missing data | Total data points (SNPs∗ individuals) | No. of missing genotypes |
|---|---|---|---|---|
| 5 | 8,538 | 33.9 | 13,575,420 | 4,607,124 |
| 4 | 19,073 | 36.5 | 30,326,070 | 11,060,070 |
| 3 | 43,799 | 38.6 | 69,640,410 | 26,852,967 |
| 2 | 264,952 | 39.9 | 421,273,680 | 168,083,777 |
The average, minimum, and maximum number of SNPs genotyped in individuals in the DH samples for each sequencing coverage level and filtering depth.
| No. of SNPs genotyped in individuals | ||||
|---|---|---|---|---|
| Skim level | Minimum read depth | Average | Minimum | Maximum |
| 0.25x | 5 | 24,144 | 666 | 47,800 |
| 4 | 56,014 | 1,531 | 104,674 | |
| 3 | 82,229 | 2,030 | 154,165 | |
| 2 | 243,102 | 13,850 | 404,436 | |
| 1 | 426,370 | 31,188 | 700,290 | |
| 0.5x | 5 | 62,917 | 1,216 | 128,273 |
| 4 | 127,748 | 2,957 | 241,270 | |
| 3 | 189,495 | 4,062 | 360,101 | |
| 2 | 460,405 | 26,899 | 740,728 | |
| 1 | 755,794 | 60,435 | 1,192,737 | |
| 1x | 5 | 173,476 | 2,294 | 358,718 |
| 4 | 299,618 | 5,920 | 553,618 | |
| 3 | 429,318 | 8,738 | 787,243 | |
| 2 | 836,761 | 53,316 | 1,292,317 | |
| 1 | 1,254,796 | 117,965 | 1,860,615 | |
| 2x | 5 | 511,008 | 6,083 | 983,954 |
| 4 | 736,420 | 14,846 | 1,281,324 | |
| 3 | 969,265 | 23,991 | 1,611,694 | |
| 2 | 1,489,570 | 118,138 | 2,144,288 | |