| Literature DB >> 21492434 |
Dario Grattapaglia1, Orzenil B Silva-Junior, Matias Kirst, Bruno Marco de Lima, Danielle A Faria, Georgios J Pappas.
Abstract
BACKGROUND: High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21492434 PMCID: PMC3090336 DOI: 10.1186/1471-2229-11-65
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Summary of the EST assembly for SNP discovery
| Sequencing | # sequences used | # sequences in | |
|---|---|---|---|
| Sanger | 67,635 | 50,720 | |
| 30,260 | 10,088 | ||
| 7,755 | 4,387 | ||
| 19,586 | 7,018 | ||
| 9,679 | 4,959 | ||
| 1,126 | 1,095 | ||
| 454 | 1,028,654 | 623,922 | |
| TOTAL | 1,164,695 | 702,009 | |
Figure 1Flowchart with the output results of the EST clustering, contig assembly and SNP discovery pipeline prior to applying SNP filtering and selection for the GGGT assay design.
Summary of the in silico SNP development procedure using increasingly stringent SNP selection and design requirements (F0 through F4) (see methods for details)
| In silico SNP performance assessment | F0 | F1 | F2 | F3 | F4 |
|---|---|---|---|---|---|
| 66,254 | 21,944 | 10,032 | 3,187 | 1,329 | |
| 9,579 | 5,058 | 2,057 | 1,651 | 998 | |
| 621 | 605 | 583 | 367 | 547 | |
| 598 | 572 | 557 | 353 | 525 | |
| 96.3 | 94.5 | 95.5 | 96.2 | 96.0 | |
| 314 | 316 | 297 | 177 | 291 | |
| 50.6 | 52.2 | 50.9 | 48.2 | 53.2 | |
| 96 | 96 | 108 | 108 | 288 |
Figure 2Distribution of the percentages of SNPs across classes of (a) GeneTrain Score; (b) GeneCall50 Score and (c) Minimum Allele Frequency (MAF) . Broken bars histograms are presented for all 768 SNPs together (ALL) and for each SNP category within the 696 genome-wide SNPs selected by the different in silico filtering levels (F0 through F4 - see methods) and the 72 candidate gene (CG) SNPs.
Summary of the in vitro SNP genotyping performance assessed in a panel of 96 individuals from five Eucalyptus species
| In vitro SNP performance assessed | Candidate genes | F0 | F1 | F2 | F3 | F4 | Total counts | % |
|---|---|---|---|---|---|---|---|---|
| # SNPs tested by the GGGT | 72 | 96 | 96 | 108 | 108 | 288 | 768 | - |
| Average SNP Call Rate (%) | 91.0 | 95.2 | 90.0 | 94.9 | 95.0 | 97.8 | - | |
| # SNP with Call rate ≥ 0.95 | 58 | 81 | 74 | 90 | 97 | 268 | 668 | 87.0 |
| % SNP with Call rate ≥ 0.95 | 80.6 | 84.4 | 77.1 | 83.3 | 89.8 | 93.1 | - | |
| Average SNP GeneTrain score | 0.61 | 0.68 | 0.66 | 0.71 | 0.67 | 0.72 | - | |
| # SNPs with GeneTrain score ≥ 0.40 | 64 | 90 | 90 | 100 | 101 | 278 | 723 | 94.1 |
| % SNPs with GeneTrain score ≥ 0.40 | 88.9 | 93.8 | 93.8 | 92.6 | 93.5 | 96.5 | - | |
| Average SNP GC50 score | 0.57 | 0.59 | 0.59 | 0.64 | 0.62 | 0.67 | - | |
| # SNPs with GC50 score ≥ 0.40 | 63 | 89 | 89 | 100 | 101 | 277 | 719 | 93.6 |
| % SNPs with GC50 score ≥ 0.40 | 87.5 | 92.7 | 92.7 | 92.6 | 93.5 | 96.2 | - | |
| Average MAF of SNPs with MAF ≥ 0.05 | 0.26 | 0.24 | 0.25 | 0.26 | 0.25 | 0.27 | - | |
| # SNP with MAF > 0.05 | 51 | 48 | 55 | 75 | 74 | 205 | 508 | 66.1 |
| % SNP with MAF > 0.05 | 70.8 | 50.0 | 57.3 | 69.4 | 68.5 | 71.2 | - |
Averages and SNP counts above specific thresholds of SNP reliability parameters (Call Rate, GeneCall50, GeneTrain scores) and polymorphism (MAF) for SNPs in pre-selected candidate genes and for genome-wide SNPs selected with increasingly stringent in silico SNP selection and design requirements (F0 through F4 - see methods for details).
Counts and percentages of polymorphic SNPs (MAF ≥ 0.05) from a total of 711 reliable SNPs, in each one of the five main Eucalyptus species surveyed (diagonal) and in pair-wise sets of species (above the diagonal)
| 209 (29.4%) | 117 (16.5%) | 128 (18.0%) | 194 (27.3%) | ||
| 107 (15.0%) | 120 (16.9%) | 187 (26.3%) | |||
| 104 (14.6%) | 118 (16.6%) | ||||
| 127 (17.9%) | |||||
Summary of SNP reliability across species, sections and subgenera of Eucalyptus as measured by the number of SNP meeting the thresholds of call rate and GeneCall50 for two groups of SNPs that differed regarding the flanking sequence constraints during in silico SNP mining and GGGT assay design
| SNPs selected with no flanking sequence | SNPs selected with no additional SNPs in | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Subgenera/Section | Species | # SNPs | % SNPs | # SNPs | % SNPs | # SNPs | % SNPs | # SNPs | % SNPs |
| ≥ 95% | ≥ 95% | ≥ 0.40 | ≥ 0.40 | ≥ 95% | ≥ 95% | ≥ 0.40 | ≥ 0.40 | ||
| 323 | 86.8 | 333 | 89.5 | 378 | 95.5 | 378 | 95.5 | ||
| 310 | 83.3 | 335 | 90.1 | 369 | 93.2 | 377 | 95.2 | ||
| 279 | 75.0 | 328 | 88.2 | 343 | 86.6 | 376 | 94.9 | ||
| 325 | 87.4 | 331 | 89.0 | 369 | 93.2 | 374 | 94.4 | ||
| 311 | 83.6 | 327 | 87.9 | 369 | 93.2 | 375 | 94.7 | ||
| 295 | 79.3 | 324 | 87.1 | 361 | 91.2 | 371 | 93.7 | ||
| 300 | 80.6 | 325 | 87.4 | 353 | 89.1 | 370 | 93.4 | ||
| 289 | 77.7 | 336 | 90.3 | 339 | 85.6 | 376 | 94.9 | ||
| 281 | 75.5 | 319 | 85.8 | 330 | 83.3 | 365 | 92.2 | ||
| 194 | 52.2 | 271 | 72.8 | 246 | 62.1 | 325 | 82.1 | ||
| 166 | 44.6 | 223 | 59.9 | 198 | 50.0 | 278 | 70.2 | ||