| Literature DB >> 22662172 |
Hoa T Truong1, A Marcos Ramos, Feyruz Yalcin, Marjo de Ruiter, Hein J A van der Poel, Koen H J Huvenaars, René C J Hogers, Leonora J G van Enckevort, Antoine Janssen, Nathalie J van Orsouw, Michiel J T van Eijk.
Abstract
Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n = 222 samples) and lettuce (n = 87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22662172 PMCID: PMC3360789 DOI: 10.1371/journal.pone.0037565
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Overview of SBG.
(A) The sequencing complexity of genomic DNA is reduced using a combination of rare and frequent cutting enzymes. (B) Sequencing adapters containing sample identification tags are ligated to the restriction fragments to construct SBG libraries. SBG libraries are amplified and sequenced using Illumina sequencing platforms. Only read 1 will be sequenced for single-end sequencing, while both read 1 and read 2 will be sequenced for paired-end sequencing. (C) SNPs are mined between the samples and simultaneously genotyped using the SBG bioinformatics analysis workflow.
Figure 2Bioinformatics analysis workflow for SBG.
The Illumina data are first processed to remove low quality reads. The reference sequences are generated by clustering the unique reads present within the dataset. The reads are subsequently aligned to the reference sequences and variation called using the GATK Unified Genotyper. Lastly, the final set of SNPs and genotypes are generated by removing SNPs not meeting the threshold for percentage of missing data and expected genotypic frequencies.
Summary statistics for generating the reference sequences.
| Arabidopsis | Lettuce read 1 | Lettuce read 2 | |
|
| 122,573,199 | 220,953,145 | 253,109,987 |
|
| 110,849,880 | 203,441,535 | 239,282,874 |
|
| 90.4 | 92.1 | 94.5 |
|
| 3,500,146 | 9,869,623 | 18,849,951 |
|
| 18,500 | 161,974 | 241,676 |
|
| 13,321 | 107,661 | 168,759 |
Variant calling for the arabidopsis and lettuce sequence datasets.
| Sequence dataset | Total number of variants | Number of contigs |
|
| 6,799 | 3,360 |
|
| 152,210 | 39,994 |
|
| 321,566 | 60,279 |
Parent-based SNP genotyping in the arabidopsis and lettuce sequence datasets.
| Arabidopsis | Lettuce read 1 | Lettuce read 2 | Lettuce all | |
|
| 1,409 | 1,918 | 3,665 | 5,583 |
|
| 273,992 | 79,674 | 135,021 | 214,695 |
|
| 194.5 | 41.5 | 36.8 | 38.5 |
|
| 3,303 | 36,627 | 63,344 | 99,971 |
|
| 1.2 | 46.0 | 46.9 | 46.6 |
|
| 139,628 | 35,787 | 58,734 | 94,521 |
|
| 51.0 | 44.9 | 43.5 | 44.0 |
|
| 131,061 | 7260 | 12943 | 20,203 |
|
| 47.8 | 9.1 | 9.6 | 9.4 |
Parent-based SNP genotyping in the arabidopsis and lettuce sequence datasets after removing SNPs displaying extreme genotypic frequencies and an excessive number of missing genotypes.
| Arabidopsis | Lettuce read 1 | Lettuce read 2 | Lettuce all | |
|
| 1,245 | 589 | 637 | 1,226 |
|
| 250,517 | 34,991 | 36,440 | 71,431 |
|
| 201.2 | 59.4 | 57.2 | 58.3 |
|
| 2,035 | 16,626 | 17,407 | 34,033 |
|
| 0.8 | 47.5 | 47.8 | 47.6 |
|
| 128,773 | 17,665 | 18,299 | 35,964 |
|
| 51.4 | 50.5 | 50.2 | 50.3 |
|
| 119,709 | 700 | 734 | 1,434 |
|
| 47.8 | 2.0 | 2.0 | 2.0 |