| Literature DB >> 35342577 |
Hui Wang1, Shenghan Gao2, Yu Liu1, Pengcheng Wang3, Zhengwang Zhang1.
Abstract
Simple sequence repeats (SSRs) are widely used genetic markers in ecology, evolution, and conservation even in the genomics era, while a general limitation to their application is the difficulty of developing polymorphic SSR markers. Next-generation sequencing (NGS) offers the opportunity for the rapid development of SSRs; however, previous studies developing SSRs using genomic data from only one individual need redundant experiments to test the polymorphisms of SSRs. In this study, we designed a pipeline for the rapid development of polymorphic SSR markers from multi-sample genomic data. We used bioinformatic software to genotype multiple individuals using resequencing data, detected highly polymorphic SSRs prior to experimental validation, significantly improved the efficiency and reduced the experimental effort. The pipeline was successfully applied to a globally threatened species, the brown eared-pheasant (Crossoptilon mantchuricum), which showed very low genomic diversity. The 20 newly developed SSR markers were highly polymorphic, the average number of alleles was much higher than the genomic average. We also evaluated the effect of the number of individuals and sequencing depth on the SSR mining results, and we found that 10 individuals and ~10X sequencing data were enough to obtain a sufficient number of polymorphic SSRs, even for species with low genetic diversity. Furthermore, the genome assembly of NGS data from the optimal number of individuals and sequencing depth can be used as an alternative reference genome if a high-quality genome is not available. Our pipeline provided a paradigm for the application of NGS technology to mining and developing molecular markers for ecological and evolutionary studies.Entities:
Keywords: microsatellite; molecular marker; next‐generation sequencing; resequencing; short tandem repeats; threatened species
Year: 2022 PMID: 35342577 PMCID: PMC8928897 DOI: 10.1002/ece3.8705
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
FIGURE 1Workflow for in silico microsatellite mining, polymorphism discovery, and primer design using a series of commonly used software programs (shown in italics). The pipeline takes multi‐sample resequencing data in FASTQ format and reference genome in FASTA format as input data (the reference genome can be generated with assembly software such as MaSuRCA from multi‐sample resequencing data for species whose reference genomes were unavailable)
FIGURE 2Distributions of SSR types and the number of alleles (Na) of 20 C. mantchuricum individuals. (a) All SSR loci. (b) Polymorphic SSR loci
Summary of the observed allele number (N a), sample size (N), observed and expected heterozygosity (Ho and H e), and polymorphism information content (PIC) for 30 individuals of brown eared‐pheasants
| No. | Marker Name |
|
|
|
| PIC |
|---|---|---|---|---|---|---|
| 1 | CM1 | 3 | 30 | 0.300 | 0.605 | 0.528 |
| 2 | CM2 | 5 | 30 | 0.567 | 0.726 | 0.667 |
| 3 | CM3 | 3 | 30 | 0.033 | 0.406 | 0.332 |
| 4 | CM7 | 4 | 30 | 0.167 | 0.547 | 0.475 |
| 5 | CM8 | 6 | 30 | 0.400 | 0.714 | 0.658 |
| 6 | CM9 | 3 | 30 | 0.400 | 0.453 | 0.381 |
| 7 | CM10 | 5 | 30 | 0.567 | 0.692 | 0.624 |
| 8 | CM11 | 4 | 30 | 0.233 | 0.551 | 0.481 |
| 9 | CM12 | 3 | 30 | 0.267 | 0.473 | 0.420 |
| 10 | CM14 | 5 | 30 | 0.567 | 0.744 | 0.684 |
| 11 | CM15 | 3 | 30 | 0.167 | 0.581 | 0.508 |
| 12 | CM16 | 5 | 30 | 0.567 | 0.724 | 0.663 |
| 13 | CM19 | 2 | 30 | 0.000 | 0.398 | 0.315 |
| 14 | CM20 | 4 | 29 | 0.103 | 0.470 | 0.423 |
| 15 | CM25 | 4 | 30 | 0.400 | 0.481 | 0.437 |
| 16 | CM26 | 4 | 30 | 0.300 | 0.584 | 0.513 |
| 17 | CM27 | 6 | 29 | 0.276 | 0.629 | 0.562 |
| 18 | CM30 | 6 | 30 | 0.633 | 0.733 | 0.675 |
| 19 | CM32 | 5 | 30 | 0.533 | 0.760 | 0.707 |
| 20 | CM33 | 4 | 28 | 0.429 | 0.660 | 0.581 |
| Mean | 4.2 | 29.8 | 0.345 | 0.597 | 0.532 |
FIGURE 3Population structure and principal coordinate analysis (PCoA) of 30 brown eared‐pheasants based on 20 SSR markers. (a) Population structure of K = 2 and K = 3 inferred by Bayesian clustering approaches. Samples of 30 brown eared‐pheasants were from Shanxi (n = 15; 1–15), Shaanxi (n = 7; 16–22), Hebei and Beijing (Hebei: n = 6, Beijing: n = 2; 23–30). (b) Principal coordinate analysis (PCoA) of 30 brown eared‐ pheasants. CM‐C: Shanxi (n = 15; green); CM‐W: Shaanxi (n = 7; blue); CM‐E: Hebei and Beijing (Hebei: n = 6, Beijing: n = 2; red)
FIGURE 4The increasing trends of the number of alleles (solid line) and the number of polymorphic SSRs (dotted line) depending on the number of individuals (a) and sequencing depth (b)
A comparison of different SSR marker develop methods, including species, SSR marker develop methods (Tra‐NGS: Traditional NGS method based on one individual), number of PCR primers tested (Pri), number of amplifiable PCR primers (Amp), percentage of primers which were amplifiable (Amp/Pri), number of primers selected to test polymorphism (Amp‐sel), number of polymorphic primers (Pol), percentage of amplifiable primers which were polymorphic (Pol/Amp‐sel), percentage of primers which were amplifiable and polymorphic (Suc), literature reference (Ref)
| Species | Method | Pri | Amp | Amp/Pri | Amp‐sel | Pol | Pol/Amp‐sel | Suc | Ref |
|---|---|---|---|---|---|---|---|---|---|
|
| Tra‐NGS | 118 | 118 | 100% | 118 | 6 | 5% | 5% | Zhu ( |
|
| Tra‐NGS | 600 | 99 | 17% | 52 | 24 | 46% | 8% | Yang et al. ( |
|
| Tra‐NGS | 144 | 143 | 99% | 143 | 49 | 34% | 34% | Koshiishi et al. ( |
|
| This study | 34 | 30 | 88% | 20 | 20 | 100% | 88% | This study |