| Literature DB >> 24312619 |
Wayne E Clarke1, Isobel A Parkin, Humberto A Gajardo, Daniel J Gerhardt, Erin Higgins, Christine Sidebottom, Andrew G Sharpe, Rod J Snowdon, Maria L Federico, Federico L Iniguez-Luy.
Abstract
Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci -QTL- analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24312619 PMCID: PMC3849492 DOI: 10.1371/journal.pone.0081992
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Selected genomic regions underlying traits of agronomical and nutritional interest in B. napus.
Map integration was conducted according to common molecular markers and parental lines used in three different mapping studies [15, 22, 50]. QTL locations were inferred from relative map positions previously described [13,15,17-19, 22] always using common sets of molecular markers and genetic stocks.
Summary of Next Generation Sequencing (NGS) results.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| DH12075 | 454 | 1,289,496 | 450,850,003 | 350 | 131.6 | 33.1 | 8.4 |
| PSA12 | 454 | 826,680 | 289,078,010 | 350 | 101.1 | 32.6 | 8.5 |
| Express | 454 | 827,074 | 313,062,567 | 379 | 105.4 | 31.4 | 8.4 |
| V8 | 454 | 711,244 | 261,475,787 | 368 | 121.5 | 33.7 | 8.2 |
| Tapidor | 454 | 778,116 | 257,650,369 | 331 | 95.1 | 31.6 | 8.5 |
| Ningyou-7 | 454 | 803,553 | 288,413,071 | 359 | 112.8 | 33.2 | 7.9 |
| Rainbow | 454 | 742,283 | 248,197,873 | 334 | 129.9 | 33.7 | 8.1 |
| YN-429 | 454 | 735,005 | 254,920,458 | 347 | 101.3 | 32.4 | 8.5 |
| CGNA1 | 454 | 742,361 | 267963689 | 364 | 118.7 | 33.1 | 8.4 |
| CGNA2 | 454 | 717,016 | 270,568,629 | 374 | 117.6 | 33.3 | 8.3 |
| DH12075 | Illumina | 167,215,495 | 8,225,138,523 | 49 | - | 40 | 2.3 |
| Express | Illumina | 184,559,482 | 10,908,314,681 | 59 | - | 42 | 2.1 |
Abbreviations: Num.=Number; Avg. = Average; SD = Standard deviation; Q-score = Quality.
* Length in base pairs (bp).
Result summary of sequenced reads mapped against the A and C Brassica genomes.
|
|
| ||||||
|---|---|---|---|---|---|---|---|
|
|
|
| |||||
|
|
|
|
|
|
|
|
|
|
| 1,289,496 | 414,569 | 32 | 159,442 | 775,554 | 60 | 263,195 |
|
| 826,680 | 241,867 | 29 | 78,683 | 497,595 | 60 | 138,398 |
|
| 827,074 | 226,406 | 27 | 81,595 | 475,214 | 57 | 167,462 |
|
| 711,244 | 190,312 | 27 | 65,744 | 411,708 | 58 | 136,009 |
|
| 778,116 | 230,934 | 30 | 75,238 | 453,606 | 58 | 140,756 |
|
| 803,553 | 240,828 | 30 | 89,273 | 480,899 | 60 | 150,945 |
|
| 742,283 | 207,465 | 28 | 67,546 | 432,873 | 58 | 132,747 |
|
| 735,005 | 219,859 | 30 | 76,381 | 427,669 | 58 | 134,101 |
|
| 742,361 | 201,604 | 27 | 69,718 | 426,791 | 57 | 144,337 |
|
| 717,016 | 195,811 | 27 | 68,147 | 412,298 | 58 | 141,580 |
|
| |||||||
|
|
|
|
|
|
|
|
|
|
| 167,215,494 | 92,287,337 | 55 | 745,049 | 102,396,687 | 61 | 1,068,906 |
|
| 184,559,482 | 84,589,870 | 46 | 834,391 | 104,674,569 | 57 | 1,606,826 |
Abbreviations: SR=sequenced reads, RM=reads matching the reference genome, RMp=reads mapped to reference genome in %, SNP=single nucleotide polymorphism.
Summary of SNP filtering criteria and discovery pipeline results.
|
|
|
|
|---|---|---|
|
| 0 | 2,740,205 |
| Multiple Variants hits | 33,917 | 2,706,288 |
| Heterozygous and Bias | 2,111,439 | 594,849 |
| Flanking Sequence | 5,482 | 589,367 |
|
|
|
* Removal of SNPs containing only heterozygous and bias SNP calls. It also removes SNPs with a percentage of heterozygous calls over a threshold (0.2).
** Removal of SNPs not meeting KASPar or Illumina Infinium flanking sequence requirements.
Figure 2Filtered SNP types and their classification according to genomic regions.
Transition and transversion SNP types were classified to show the proportion of SNPs annotated to each of three genomic regions (Intergenic, Intron, and CDS). CDS: coding sequence.
Figure 3Filtered SNP counts characterization summary.
Total SNP counts were classified by genomic location (Intergenic vs. Genic) and further separated into transitions and transversions. Genic SNPs are also described in terms of their location within the gene (CDS vs. intron). CDS: coding sequence.
Summary of sequence read mapping performance using multiple reference sequence sets.
|
|
| ||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
|
| DH12075 | 1,289,496 | 414,569 | 0 | 457,448 | 10.3 | 775,554 | 69.5 |
| PSA12 | 826,680 | 241,867 | 0 | 279,959 | 15.7 | 497,595 | 77.7 |
| Express | 827,074 | 226,406 | 0 | 271,137 | 19.8 | 475,214 | 75.3 |
| V8 | 711,244 | 190,312 | 0 | 230,314 | 21 | 411,708 | 78.8 |
| Tapidor | 778,116 | 230,934 | 0 | 266,100 | 15.2 | 453,606 | 70.5 |
| Ningyou | 803,553 | 240,828 | 0 | 277,338 | 15.2 | 480,899 | 73.4 |
| Rainbow | 742,283 | 207,465 | 0 | 248,757 | 19.9 | 432,873 | 74 |
| YN-429 | 735,005 | 219,859 | 0 | 247,101 | 12.4 | 427,669 | 73.1 |
| CGNA1 | 742,361 | 201,604 | 0 | 241,700 | 19.9 | 426,791 | 76.6 |
| CGNA2 | 717,016 | 195,811 | 0 | 233,568 | 19.3 | 412,298 | 76.5 |
|
| |||||||
|
|
|
|
|
|
|
|
|
| DH12075 | 167,215,494 | 92,287,337 | 0 | 74,985,743 | -18.7 | 102,396,687 | 36.6 |
| Express | 184,559,482 | 84,589,870 | 0 | 78,787,995 | -6.9 | 104,674,569 | 32.9 |
Abbreviations: SR = sequenced reads, RM = reads matching the reference genome, RMpi = percentage increase in reads mapped over previous reference.
Figure 4Summary of DNA sequence coverage.
The average depth of coverage in captured and non-captured regions across all 19 A and C Brassica pseudomolecules is illustrated. Captured regions are those from the original sequence capture selection combined with the orthologous sequence from the complementary genome.
Singleplex KASPar SNP validation assay of a set of 100 discovered SNP markers.
|
|
| |||
|---|---|---|---|---|
| Diversity Set | DH12075 x PSA12 | V8 x Express | Both populations | |
|
| 25 | 30 | 29 | - |
|
| 100 | 19 | 37 | 44 |
|
| ||||
| No amplification | 12 | 0 | 4 | 8 |
| Monomorphic | 9 | 1 | 5 | 6 |
| Multiple Loci | 8 | 2 | 0 | 6 |
| Polymorphic | 71 | 16 | 28 | 24 |
| % PA | 88 | 100 | 89 | 82 |
|
|
|
|
|
|
Abbreviations: PA = Positive Amplification; PS = Polymorphic SNPs.
* Represents markers specifically designed from SNPs that showed polymorphism for a specific set of reference mapping parental lines. For instance, polymorphisms between the spring type parents, (DH12075 and PSA12); polymorphisms between the winter type parents, (V8 and Express); and polymorphisms detected for both sets of parental lines at the same SNP locus.
** Out of the 100 SNPs tested 73 SNPs were A genome-specific and 27 were C genome-specific.
*** Polymorphic SNPs also include two dominant (presence/absence of the tested allele) marker types.