| Literature DB >> 32411534 |
Olivier Lepais1,2, Emilie Chancerel1, Christophe Boury1, Franck Salin1, Aurélie Manicki2, Laura Taillebois2, Cyril Dutech1, Abdeldjalil Aissi3, Cecile F E Bacles2, Françoise Daverat4, Sophie Launey5, Erwan Guichoux1.
Abstract
Application of high-throughput sequencing technologies to microsatellite genotyping (SSRseq) has been shown to remove many of the limitations of electrophoresis-based methods and to refine inference of population genetic diversity and structure. We present here a streamlined SSRseq development workflow that includes microsatellite development, multiplexed marker amplification and sequencing, and automated bioinformatics data analysis. We illustrate its application to five groups of species across phyla (fungi, plant, insect and fish) with different levels of genomic resource availability. We found that relying on previously developed microsatellite assay is not optimal and leads to a resulting low number of reliable locus being genotyped. In contrast, de novo ad hoc primer designs gives highly multiplexed microsatellite assays that can be sequenced to produce high quality genotypes for 20-40 loci. We highlight critical upfront development factors to consider for effective SSRseq setup in a wide range of situations. Sequence analysis accounting for all linked polymorphisms along the sequence quickly generates a powerful multi-allelic haplotype-based genotypic dataset, calling to new theoretical and analytical frameworks to extract more information from multi-nucleotide polymorphism marker systems. ©2020 Lepais et al.Entities:
Keywords: HapSTR; Haplotype sequence; SNPSTR; SSR-GBS; SSR-seq; Sequence-based microsatellite genotyping
Year: 2020 PMID: 32411534 PMCID: PMC7204839 DOI: 10.7717/peerj.9085
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
SSRseq development strategy and DNA characteristics of species used in this study.
| Animalia | Actinopterygii | Previously developed loci | 1,152 | Salt-chloroform ( | High | ||
| Plantae | Eudicots | Re-designed primers around already developed loci using reference genome sequence of closely related species | 380 | Invisorb DNA Plant HTS 96 kit | High | This study | |
| Animalia | Actinopterygii | 382 | Invitrogen PureLink Genomic DNA Mini kit | Highly degraded | |||
| Fungi | Agaricomycetes | 384 | CTAB ( | Heterogeneous | This study | ||
| Animalia | Insecta | 91 | Qiagen DNeasy 96 Blood & Tissue Kit | High | This study |
Figure 1Workflow for SSRseq markers optimization or development depending on genomic resource availability, from selection to multiplexed amplification and library preparation to bioinformatics analysis.
Summary of the tested scenarios for SSRseq genotyping.
| Previously developed loci | 81 | – | 23 | Ion Torrent PGM i316 | 960 | 66 | 66 | 20 | 161 (61%) | 9 | 39% | |
| 23 | Illumina MiSeq - 1/2 nano PE | 192 | 96 | 96 | 20 | 70 (41%) | 10 | 43% | ||||
| Previously developed loci | 15 | – | 15 | Illumina MiSeq - 1/2 nano PE | 192 | 96 | 96 | 13 | 99 (65%) | 7 | 47% | |
| Primer redesign around previously developped loci | 462 | 60 | 60 | Illumina MiSeq - 1/3 V2 PE | 380 | 46 | 46 | 53 | 260 (32%) | 40 | 67% | |
| 2,872 | 60 | 28 | Illumina MiSeq - 1 nano SE | 382 | 156 | 156 | 25 | 95 (58%) | 21 | 75% | ||
| 28 | Illumina MiSeq - 2 nano SE | 382 | 156 | 156 | 26 | 198 (58%) | 24 | 86% | ||||
| 28 | Illumina MiSeq - 3 nano SE | 382 | 156 | 156 | 26 | 267 (58%) | 24 | 86% | ||||
| 1,806 | 60 | 51 | Illumina MiSeq - 1/2 V2 PE | 384 | 384 | 96 | 48 | 243 (83%) | 38 | 75% | ||
| 8,937 | 60 | 54 | Illumina MiSeq - 1/4 V2 PE | 182 | 91 | 91 | 49 | 176 (45%) | 39 | 72% |
Notes.
Loci showing substantial evidence for minimum sequencing success (at least 20 sequences in at least 50% of the individuals).
Reliable loci (less than 50% of missing data among individuals and less than 6% of genotyping error based on comparison of repeated genotyping).
Routinely genotyped using optimized multiplexed PCR and capillary-based sequencer. FDSTools analysis using two parameter sets: stutterfinder -s:-1:50, +1:10 allelefinder -m 15 -n 20; and stuttermark -s:-1:70, +1:10 allelefinder -m 10 -n 20. For each marker, four parameter combination were used (two strategies and two parameters set) and for each strategy, the best parameter set was used for a given locus.
Figure 2Results of SSRsq development from previously developed microsatellites.
S. salar for (A) a new multiplex of 23 microsatellite sequenced with Ion Torrent PGM and (B) Illumina MiSeq sequencing platforms, and (C) a routinely-used multiplex of 15 microsatellites sequenced with Illumina MiSeq sequencing platform. Number of reliable loci, total number of alleles, missing data and allelic error rates are indicated for three bioinformatics analysis strategies that focused either on all polymorphism across the sequence, on polymorphism within the repeated motif only, or a combination of the best strategy for each locus.
Figure 3Results of SSRseq development based on newly optimized microsatellites.
(A) Quercus sp., (B) Alosa sp., (C) A. ostoyae and (D) M. variegatipes sequenced with Illumina MiSeq sequencing platform. Number of reliable loci, total number of alleles, missing data and allelic error rates are indicated for three bioinformatics analysis strategies that focused either on all polymorphism across the sequence, on polymorphism within the repeated motif only, or a combination of the best strategy for each locus.
Detected polymorphism.
| 14 | 122 | 108 | 11% | 13% | 107 (14) | 3 (3) | – | 2 (2) | 3 (3) | |
| 40 | 537 | 346 | 35% | 55% | 406 (40) | 38 (25) | 1 (1) | 47 (18) | 13 (10) | |
| 24 | 174 | 150 | 14% | 16% | 130 (23) | 13 (15) | 2 (2) | 4 (4) | 3 (3) | |
| 38 | 398 | 187 | 53% | 113% | 187 (33) | 79 (26) | 8 (7) | 312 (23) | 41 (16) | |
| 39 | 166 | 156 | 6% | 6% | 147 (39) | 3 (3) | 1 (1) | 9 (7) | 5 (5) | |
Notes.
Irrespectively of polymorphism type, computed based on the Combined analysis strategy.
Simulating the number of alleles that would have been identified using traditional capillary electrophoresis, computed based on the FullLength analysis strategy and accounting for allele size only on the same locus as the Combined approach.
Total number of alleles (and number of loci in brackets) for each polymorphism type.
Combination of two sequence based microsatellite genotyping protocols.
Figure 4Proportion of detected polymorphism types within the repeat motif or in the flanking sequence for each sample per species group.