| Literature DB >> 31015986 |
Evsey Kosman1, Jukka Jokela2,3.
Abstract
Microsatellites (simple sequence repeats, SSRs) still remain popular molecular markers for studying neutral genetic variation. Two alternative models outline how new microsatellite alleles evolve. Infinite alleles model (IAM) assumes that all possible alleles are equally likely to result from a mutation, while stepwise mutation model (SMM) describes microsatellite evolution as stepwise adding or subtracting single repeat units. Genetic relationships between individuals can be analyzed in higher precision when assuming the SMM scenario with allele size differences as a proxy of genetic distance. If population structure is not predetermined in advance, an empirical data analysis usually includes (a) estimating proximity between individual SSR profiles with a selected dissimilarity measure and (b) determining putative genetic structure of a given set of individuals using methods of clustering and/or ordination for the obtained dissimilarity matrix. We developed new dissimilarity indices between SSR profiles of haploid, diploid, or polyploid organisms assuming different mutation models and compared the performance of these indices for determining genetic structure with population data and with simulations. More specifically, we compared SMM with a constant or variable mutation rate at different SSR loci to IAM using data from natural populations of a freshwater bryozoan Cristatella mucedo (diploid), wheat leaf rust Puccinia triticina (dikaryon), and wheat powdery mildew Blumeria graminis (monokaryon). We show that inferences about population genetic structure are sensitive to the assumed mutation model. With simulations, we found that Bruvo's distance performs generally poorly, while the new metrics are capturing the differences in the genetic structure of the populations.Entities:
Keywords: Bruvo's distance; SSR markers; genetic dissimilarity of individuals; infinite alleles model; population structure; stepwise mutation model
Year: 2019 PMID: 31015986 PMCID: PMC6467862 DOI: 10.1002/ece3.5032
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Summary of linear models where known relatedness between pairs of genotypes (either in mutation or generation number) is predicted with genetic dissimilarity measures under different scenarios of SSR evolution
| Simulation attributes | Goodness of fit estimates | |||||
|---|---|---|---|---|---|---|
| Scenario | Difference between alleles | RMSE | CV(RMSE) | MAE |
| |
| Predicted | Actual number of | |||||
| SMMc_1 |
| Generations | 0.092 | 0.304 | 0.075 | 0.925 |
|
| 0.041 | 0.366 | 0.030 | 0.921 | ||
|
| Mutations | 0.094 | 0.311 | 0.077 | 0.921 | |
|
| 0.043 | 0.377 | 0.031 | 0.915 | ||
| SMMc_2 |
| Generations | 0.093 | 0.326 | 0.077 | 0.914 |
|
| 0.040 | 0.404 | 0.029 | 0.903 | ||
| SMMv |
| Generations | 0.107 | 0.364 | 0.089 | 0.889 |
|
| 0.043 | 0.423 | 0.032 | 0.885 | ||
|
| Mutations | 0.110 | 0.376 | 0.091 | 0.882 | |
|
| 0.045 | 0.441 | 0.034 | 0.874 | ||
Models were forced through zero intercept. SMMc_1 describes a stepwise mutation model simulation with constant mutation rate across loci after on average 691 generations of evolution (max = 1,362, min = 5); SMMc_2 respectively describes a stepwise mutation model simulation with constant rate of mutations across loci after on average 456 generations of evolution (max = 891, min = 2); SMMv describes a stepwise mutation model simulation with variable rate of mutations across loci after on average 254 generations of evolution (max = 518, min = 1). In each simulation a population of 100 individuals was sampled from a single haploid pedigree.
Dissimilarities for the SMMc (Equations 5 and 5′).
Dissimilarities for the SMMv (Equations 4 and 4′).
Root‐mean‐square error (RMSE).
Coefficient of variation of the RMSE.
Mean absolute error (MAE).
R‐square criterion.
SSR allele composition of Cristatella mucedo population (197 colonies)
| Locus | Repeat size | Missing data | Allele size | Max difference between alleles | Number of alleles | Proportion of homozygotes | |
|---|---|---|---|---|---|---|---|
| min | max | ||||||
| 1 | 2 | 0 | 197 | 229 | 16 | 9 | 0 |
| 2 | 2 | 4 | 242 | 270 | 14 | 8 | 0.28 |
| 3 | 2 | 1 | 207 | 309 | 51 | 7 | 0.08 |
| 4 | 2 | 0 | 102 | 194 | 46 | 12 | 0.22 |
| 5 | 3 | 0 | 188 | 221 | 11 | 7 | 0.06 |
| 6 | 2 | 0 | 194 | 208 | 7 | 7 | 0 |
| 7 | 2 | 0 | 244 | 254 | 5 | 4 | 0.22 |
| 8 | 2 | 0 | 154 | 208 | 27 | 11 | 0.02 |
Number of nucleotides.
Number of tandem repeats.
SSR allele composition of 192 isolates of Puccinia triticina Eriks
| Locus | Repeat size | Missing data | Allele size | Max difference between alleles | Number of alleles | Proportion of homozygotes | |
|---|---|---|---|---|---|---|---|
| min | max | ||||||
| 1 | 2 | 0 | 127 | 131 | 2 | 3 | 0.90 |
| 2 | 2 | 0 | 365 | 369 | 2 | 3 | 0.86 |
| 3 | 2 | 0 | 306 | 310 | 2 | 3 | 0.99 |
| 4 | 2 | 0 | 296 | 302 | 3 | 3 | 0.31 |
| 5 | 2 | 0 | 391 | 395 | 2 | 3 | 0.99 |
| 6 | 2 | 0 | 383 | 387 | 2 | 3 | 0.87 |
| 7 | 2 | 0 | 245 | 247 | 1 | 2 | 0.49 |
| 8 | 3 | 0 | 476 | 479 | 1 | 2 | 0.79 |
| 9 | 2 | 0 | 392 | 396 | 2 | 2 | 0.41 |
| 10 | 3 | 0 | 233 | 242 | 3 | 4 | 0.20 |
| 11 | 2 | 0 | 216 | 218 | 1 | 2 | 0.15 |
| 12 | 2 | 0 | 215 | 217 | 1 | 2 | 0.36 |
| 13 | 2 | 0 | 211 | 215 | 2 | 3 | 0.42 |
| 14 | 3 | 0 | 344 | 350 | 2 | 3 | 0.73 |
| 15 | 3 | 0 | 150 | 153 | 1 | 2 | 0.96 |
| 16 | 2 | 0 | 349 | 351 | 1 | 2 | 0.93 |
| 17 | 2 | 0 | 244 | 246 | 1 | 2 | 0.56 |
| 18 | 2 | 0 | 313 | 333 | 10 | 4 | 0.59 |
Number of nucleotides.
Number of tandem repeats.
SSR allele composition of 57 isolates of Blumeria graminis f. sp. tritici
| Locus | Repeat size | Missing data | Allele size | Max difference between alleles | Number of alleles | |
|---|---|---|---|---|---|---|
| min | max | |||||
| 1 | 3 | 14 | 155 | 509 | 118 | 28 |
| 2 | 4 | 5 | 276 | 284 | 2 | 3 |
| 3 | 2 | 1 | 180 | 202 | 11 | 11 |
| 4 | 3 | 4 | 243 | 303 | 20 | 10 |
| 5 | 4 | 1 | 153 | 165 | 3 | 4 |
| 6 | 4 | 3 | 192 | 260 | 17 | 10 |
| 7 | 3 | 4 | 266 | 560 | 98 | 27 |
Number of nucleotides.
Number of tandem repeats.
Association between original dissimilarity matrixes (below diagonal) and cophenetic ultrametric distances for UPGMA dendrograms obtained with the corresponding dissimilarities (above diagonal) measured with Mantel tests for (a) Cristatella mucedo population; (b) collection of Puccinia triticina isolates; and (c) collection of Blumeria graminis isolates
| IAM | MANMC | SMMc | SMMv | |
|---|---|---|---|---|
| (a) | ||||
| IAM | 0.374 | 0.375 | 0.931 | |
| MANMC | 0.59 | 0.999 | 0.504 | |
| SMMc | 0.591 | 0.999 | 0.505 | |
| SMMv | 0.896 | 0.814 | 0.814 | |
| (b) | ||||
| IAM | 0.729 | 0.728 | 0.818 | |
| MANMC | 0.766 | 1 | 0.665 | |
| SMMc | 0.766 | 1 | 0.665 | |
| SMMv | 0.954 | 0.805 | 0.805 | |
| (c) | ||||
| IAM | 0.229 | 0.253 | 0.508 | |
| MANMC | 0.401 | 0.847 | 0.641 | |
| SMMc | 0.309 | 0.876 | 0.718 | |
| SMMv | 0.616 | 0.698 | 0.765 | |
IAM: dissimilarity for the infinite alleles model (Equation 6); MANMC: dissimilarity (minimum average number of mutations per a copy of haploid genome; Equation 7); SMMc: dissimilarity for the stepwise mutation model with a constant rate of mutations (Equation 5); SMMv: dissimilarity for the stepwise mutation model with a variable rate of mutations (Equation 4).
Figure 1Comparison of UPGMA trees calculated for the Cristatella mucedo dataset using different pairwise dissimilarity matrices. Normalized symmetric difference (Robinson–Foulds distance) reports the proportion of partitions that are not shared between the trees, while the Branch Score Difference is a measure of branch length differences between the two trees (Steel & Penny, 1993). (a) Comparison between UPGMA trees calculated assuming SSM with constant mutation rate () and IAM (). (b) Comparison between UPGMA trees calculated assuming SSM with constant mutation rate () and SSM with variable mutation rate (). (c) Comparison between UPGMA trees calculated assuming SSM with variable mutation rate () and IAM ()
Figure 2Comparison of UPGMA trees calculated for the leaf rust (Puccinia triticina) dataset using different pairwise dissimilarity matrices. Normalized symmetric difference (Robinson–Foulds distance) reports the proportion of partitions that are not shared between the trees, while the Branch Score Difference is a measure of branch length differences between the two trees (Steel & Penny, 1993). (a) Comparison between UPGMA trees calculated assuming SSM with constant mutation rate () and IAM (). (b) Comparison between UPGMA trees calculated assuming SSM with constant mutation rate () and SSM with variable mutation rate (). (c) Comparison between UPGMA trees calculated assuming SSM with variable mutation rate () and IAM ()
Figure 3Comparison of UPGMA trees calculated for the powdery mildew (Blumeria graminis) dataset using different pairwise dissimilarity matrices. Normalized symmetric difference (Robinson–Foulds distance) reports the proportion of partitions that are not shared between the trees, while the Branch Score Difference is a measure of branch length differences between the two trees (Steel & Penny, 1993). (a) Comparison between UPGMA trees calculated assuming SSM with constant mutation rate () and IAM (). (b) Comparison between UPGMA trees calculated assuming SSM with constant mutation rate () and SSM with variable mutation rate (). (c) Comparison between UPGMA trees calculated assuming SSM with variable mutation rate () and IAM ()