| Literature DB >> 26909185 |
Salla Vartia1, José L Villanueva-Cañas2, John Finarelli3, Edward D Farrell4, Patrick C Collins5, Graham M Hughes3, Jeanette E L Carlsson4, David T Gauthier6, Philip McGinnity7, Thomas F Cross7, Richard D FitzGerald8, Luca Mirimin9, Fiona Crispie10, Paul D Cotter10, Jens Carlsson4.
Abstract
This study examines the potential of next-generation sequencing based 'genotyping-by-sequencing' (GBS) of microsatellite loci for rapid and cost-effective genotyping in large-scale population genetic studies. The recovery of individual genotypes from large sequence pools was achieved by PCR-incorporated combinatorial barcoding using universal primers. Three experimental conditions were employed to explore the possibility of using this approach with existing and novel multiplex marker panels and weighted amplicon mixture. The GBS approach was validated against microsatellite data generated by capillary electrophoresis. GBS allows access to the underlying nucleotide sequences that can reveal homoplasy, even in large datasets and facilitates cross laboratory transfer. GBS of microsatellites, using individual combinatorial barcoding, is potentially faster and cheaper than current microsatellite approaches and offers better and more data.Entities:
Keywords: Gadus morhua; amplicon sequencing; genotyping by sequencing; next-generation sequencing; ssr; universal primer
Year: 2016 PMID: 26909185 PMCID: PMC4736940 DOI: 10.1098/rsos.150565
Source DB: PubMed Journal: R Soc Open Sci ISSN: 2054-5703 Impact factor: 2.963
Figure 1.Diagram of the four-primer PCR and the structure of the resulting amplicon.
AICc model selection for the reduced (‘Celtic Sea’) dataset on the read yield. (Sample size is 90194. K is the number of parameters estimated for a given model structure, LogL is the log-likelihood of the model, AICc is the finite-sample AIC score for the model, dAICc is the difference in AICc score between the given model and the optimal model score, and Post. Prob is the model posterior probability.)
| model description | LogL | AICc | dAICc | Post. Prob. | |
|---|---|---|---|---|---|
| PCR | 3 | −87 627 | 175 260 | 0 | 1.000 |
| tails | 4 | −123 574 | 247 157 | 71 896 | 0 |
| forward | 6 | −143 882 | 287 776 | 112 516 | 0 |
| reverse | 8 | −157 688 | 315 391 | 140 131 | 0 |
| PCR × tails | 12 | −210 436 | 420 896 | 245 635 | 0 |
| individuals | 16 | −216 345 | 432 722 | 257 461 | 0 |
| forward × tails | 24 | −266 358 | 532 764 | 357 504 | 0 |
| PCR × reverse | 24 | −245 065 | 490 177 | 314 917 | 0 |
| reverse × tails | 32 | −281 170 | 562 403 | 387 143 | 0 |
| locus | 53 | −318 574 | 637 253 | 461 993 | 0 |
| PCR × reverse × tails | 96 | −367 637 | 735 467 | 560 206 | 0 |
Model selection using AICc on the correspondence between 454 microsatellites and ABI microsatellites, using the full dataset. (K is the number of parameters estimated for a given model structure, LogL is the log-likelihood of the model, AICc is the finite-sample AIC score for the model, dAICc is the difference in AICc score between the given model and the optimal model score, and Post. Prob is the model posterior probability.)
| model | model comments | LogL | AICc | dAICc | Post. Prob. | |
|---|---|---|---|---|---|---|
| no effects | average over all data | 1 | −1724.64 | 3451.28 | 131.96 | 2.22 × 10−29 |
| no. reads (2) | 0–5 versus 5 + reads | 2 | −1693.93 | 3391.87 | 72.55 | 1.77 × 10−16 |
| no. reads (3) | 0–5 versus 5–10 versus 10 + reads | 3 | −1676.04 | 3358.10 | 38.78 | 3.80 × 10−9 |
| MST type | 5 | −1693.54 | 3397.11 | 77.79 | 1.29 × 10−17 | |
| reads (2) by MST type | 0–5 versus 5 + reads | 10 | −1663.76 | 3347.59 | 28.27 | 7.26 × 10−7 |
| reads (3) by MST type | 0–5 versus 5–10 versus 10 + reads | 15 | −1644.58 | 3319.32 | 0.00 | 0.999 |
| PCR (3,6) v. 18 | 2 | −1706.09 | 3416.18 | 96.86 | 9.27 × 10−22 | |
| PCR 3,6,18 | 3 | −1706.08 | 3418.16 | 98.84 | 3.45 × 10−22 | |
| reads (2) by PCR 3,6,18 | 0–5 versus 5 + reads | 6 | −1675.28 | 3362.58 | 43.26 | 4.03 × 10−10 |
| reads (3) by PCR 3,6,18 | 0–5 versus 5–10 versus 10 + reads | 9 | −1658.37 | 3334.79 | 15.47 | 0.0004 |
Figure 2.Correspondence of the GBS microsatellite data with ABI data for the full dataset. The y-axis represents the percentage of corresponding genotype calls of all genotype calls. The x-axis represents the increasing minimum threshold of read depth required for making a genotype call. The applied thresholds were 5, 10, 50, 100, 150, 200, 250, 300, 400 and 500 reads.