| Literature DB >> 23372741 |
Humira Sonah1, Maxime Bastien, Elmer Iquira, Aurélie Tardivel, Gaétan Légaré, Brian Boyle, Éric Normandeau, Jérôme Laroche, Stéphane Larose, Martine Jean, François Belzile.
Abstract
Highly parallel SNP genotyping platforms have been developed for some important crop species, but these platforms typically carry a high cost per sample for first-time or small-scale users. In contrast, recently developed genotyping by sequencing (GBS) approaches offer a highly cost effective alternative for simultaneous SNP discovery and genotyping. In the present investigation, we have explored the use of GBS in soybean. In addition to developing a novel analysis pipeline to call SNPs and indels from the resulting sequence reads, we have devised a modified library preparation protocol to alter the degree of complexity reduction. We used a set of eight diverse soybean genotypes to conduct a pilot scale test of the protocol and pipeline. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, we obtained 5.5 M reads and these were processed using our pipeline. A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence. Validation of over 400 genotypes at a set of randomly selected SNPs using Sanger sequencing showed a 98% success rate. We then explored the use of selective primers to achieve a greater complexity reduction during GBS library preparation. The number of SNP calls could be increased by almost 40% and their depth of coverage was more than doubled, thus opening the door to an increase in the throughput and a significant decrease in the per sample cost. The approach to obtain high quality SNPs developed here will be helpful for marker assisted genomics as well as assessment of available genetic resources for effective utilisation in a wide number of species.Entities:
Mesh:
Year: 2013 PMID: 23372741 PMCID: PMC3553054 DOI: 10.1371/journal.pone.0054603
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flowchart showing steps performed for the identification of SNPs in the IGST-GBS pipeline.
The process can be divided into three main steps: data processing, mapping and SNP calling.
Figure 2In silico analysis of restriction enzyme sites in the soybean genome.
Fragment size distribution obtained by in silico digestion of soybean chromosome 5 with ApeK1, Pst1 and Mse1 restriction enzymes showing a higher percentage of ApeK1 fragments in a suitable range for genotyping by sequencing.
Summary of sequenced raw and processed reads in eight soybean genotypes obtained on an Illumina Genome Analyzer II.
| Genotypes | Maple Donovan | Toma | S19-90 | Williams 82 | PS46RR | TGx1989-53F | TGx1990-67F | Ocepara-4 | Total | |
|
| 540,827 | 805,460 | 763,541 | 877,607 | 578,458 | 440,636 | 526,300 | 1,003,014 | 5,535,843 | |
|
| 98.77 | 98.75 | 98.77 | 98.77 | 98.76 | 98.76 | 98.76 | 98.76 | 98.76 | |
|
| 82.58 | 85.58 | 84.64 | 86.96 | 85.23 | 83.47 | 85.60 | 84.64 | 85.00 | |
Figure 3Sequence coverage and SNP distribution.
(a) Distribution of mapped sequence reads (scaled down to 1/10) and SNPs identified using a GBS approach, and (b) corresponding frequency of genes and transposons identified in the same bins on soybean chromosome 5. All the transposons and genes were retrieved from the soybase and phytozome database respectively (www.soybase.org, www.phytozome.org).
Validation of single nucleotide polymorphism calls by Sanger sequencing.
| Genotypes | Concordant | Discordant | Validation (%) | ||||
| AA | AB | BB | AA | AB | BB | ||
| Set A | 115 | 3 | 71 | 0 | 1 | 2 | 98.4 |
| Set B | 111 | 9 | 67 | 0 | 5 | 0 | 97.4 |
Set A = Maple Donovan, Toma, S19-90, Williams 82, PS46RR, TGx1989-53F, TGx1990-67F, Ocepara-4.
Set B = QS4003.28B, OAC Thames, OAC Eramosa, OAC09-01C, AC Harmony, PI159925, X5331-1-S1-1S-3-B, X5194-1-54-2-1-B.
AA-homozygous for reference allele, AB-heterozygous, BB-homozygous for alternate allele.
Figure 4Distribution of SNPs on the basis of their location in respective predicted gene models in soybean genome.
SNPs were categorised using gene structure information retrieved from phytozome (www.phytozome.org).
Figure 5Phylogenetic tree showing genetic distance among a set of eight diverse soybean cultivars.
The phylogenetic tree was constructed on the basis of 10,120 SNPs identified using the GBS approach.
Figure 6Impact of selective amplification on the number and depth of coverage of SNPs.
(a) Schematic representation of an ApeKI restriction fragment flanked by suitable ligated adapters and the position of standard or selective primers, (b) Comparison of the number of SNPs and sequence read depth obtained with different sets of primers, and (c) number of SNPs and mean depth of coverage observed with selective amplification of ApeKI digested fragments with AC selective primers at different levels of multiplexing.