| Literature DB >> 26134496 |
Anna M Kopps1, Jungkoo Kang2, William B Sherwin3, Per J Palsbøll4.
Abstract
Kinship analyses are important pillars of ecological and conservation genetic studies with potentially far-reaching implications. There is a need for power analyses that address a range of possible relationships. Nevertheless, such analyses are rarely applied, and studies that use genetic-data-based-kinship inference often ignore the influence of intrinsic population characteristics. We investigated 11 questions regarding the correct classification rate of dyads to relatedness categories (relatedness category assignments; RCA) using an individual-based model with realistic life history parameters. We investigated the effects of the number of genetic markers; marker type (microsatellite, single nucleotide polymorphism SNP, or both); minor allele frequency; typing error; mating system; and the number of overlapping generations under different demographic conditions. We found that (i) an increasing number of genetic markers increased the correct classification rate of the RCA so that up to >80% first cousins can be correctly assigned; (ii) the minimum number of genetic markers required for assignments with 80 and 95% correct classifications differed between relatedness categories, mating systems, and the number of overlapping generations; (iii) the correct classification rate was improved by adding additional relatedness categories and age and mitochondrial DNA data; and (iv) a combination of microsatellite and single-nucleotide polymorphism data increased the correct classification rate if <800 SNP loci were available. This study shows how intrinsic population characteristics, such as mating system and the number of overlapping generations, life history traits, and genetic marker characteristics, can influence the correct classification rate of an RCA study. Therefore, species-specific power analyses are essential for empirical studies.Entities:
Keywords: identity by descent (IBD); intrinsic population characteristics; pedigree reconstruction; relatedness; relatedness category assignment
Mesh:
Substances:
Year: 2015 PMID: 26134496 PMCID: PMC4555218 DOI: 10.1534/g3.115.019323
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Questions, reasoning and results
| No. | Questions | Reasoning | Results |
|---|---|---|---|
| 1 | How much does the correct classification rate of an RCA increase with an increasing number of used loci? | It is hypothesized that an increased number of loci (or alleles) increases informativeness and thus the larger the number of loci the higher the correct classification rate of the assignment ( | Increasing the number of SNP and STR loci (see heights of bars with correct color in |
| 2 | How do correct classification rates differ between categories with different degrees of relatedness? | With decreasing relatedness, average allele sharing is expected to decrease, while variance increases. On average PO and FS share half, R = 0.25 one quarter, and R = 0.125 one eighth of their genome IBD. Thus, actual differences in mean expected allele sharing decrease with increasing category of relatedness and are more prone to be misdiagnosed. PO dyads may be prone to being misdiagnosed as FS dyads, or | Categories of more closely related dyads were assigned with higher correct classification rate or >95% (80%) correct classification rate was reached with fewer loci, respectively ( |
| 3 | Does the mean MAF of SNPs influence the correct classification rate of the RCA? | Loci with greater MAF are considered more informative than loci with lower MAF ( | The lower the MAF the more loci were required for an RCA to reach >95% (80%) correct classification rate. The effect seemed larger between MAF = 0.05/0.25 than between MAF = 0.25/0.5 ( |
| 4 | Which relatedness categories can be assigned with acceptable correct classification rate, defined as >80% or 95% of dyads that are assigned to a category are true members of that category? | Natural variance in allele sharing is expected to increase with decreasing relatedness. Therefore the correct classification rate is expected to decreases with decreasing category of relatedness (question 2). | PO, FS (except promiscuous) and R = 0.25 could be assigned with >95% correct classification rate when the informativeness of genetic markers was sufficient ( |
| In a single simulation using 50,000 SNPs and six relatedness categories, R = 0.125 could be assigned with 81.72% correct classification rate in a monogamous scenario with MAF 0.5 (8th blue bar in subplot (5,4) in | |||
| R = 0.125 was assigned with a >80% correct classification rate for some scenarios when the category R = 0.0625 was included in the analyses ( | |||
| Note that with the population size and parameters used, more than ≥95% of individuals are unrelated, so even if all dyads were assigned to the category “unrelated” the correct classification rate might be >95% [average proportion of unrelated individuals in simulated population with/without R = 0.0625 considered as related: 0.95/0.98 (monogamy), 0.95/0.98 (polygyny), 0.96/0.98 (promiscuity)]. | |||
| 5 | Does a population’s mating system influence the correct classification rate of the RCA? | The kinship composition (proportion of dyads/relatedness category) differs between mating systems. Dyads of some categories are expected to occur less frequently in certain mating systems ( | The minimum number of loci required for an RCA with >95% (80%) correct classification rate differed between mating systems ( |
| 6 | Does the proportion of the population sampled affect the correct classification rate of the RCA? | This requires investigation because two opposing processes can be envisaged. First, allele sharing between individuals does not change with increasing proportion of the population sampled. However, the assignment of relatedness categories is based on allele frequencies and thus the correct classification rate of the assignment may depend on accurate allele frequency estimates. The power of allele frequency estimates is expected to increase with an increasing proportion of the population sampled ( | For RCAs with 3200 SNPs, it appears that, independent of mating system and MAF, the proportion of the population sampled did not influence the correct classification rate of the RCA for PO, FS, and R = 0.25 (data not shown). However, for R = 0.125 and the same number of SNPs, RCA correct classification rate seemed to increase with decreasing proportion of the population sampled (data not shown). A similar observation was made with 400 available SNP loci for categories R = 0.25 and R = 0.125 (third and fourth columns of subplots in |
| Second, the number of dyads in a sample increases exponentially with increasing sample size, with the number of unrelated dyads increasing much faster than that of related dyads ( | |||
| 7 | Does excluding or adding certain relatedness categories from consideration alter the correct classification rate of the RCA? | Inevitably, some categories of very distant relatives will not be investigated in every study, so decisions must be made about what categories to assess. Compared to this study, fewer genetic markers are recommended to be used by studies in which only two relatedness categories are considered ( | By excluding certain categories ( |
| An alternative way to increase the correct classification rate may be to leave all categories in the assignment and even add more for the calculations but then not use the results of certain categories for inferences. Assessing additional categories may help exclude many false positives. | |||
| 8 | Does a combination of SNP and STR markers improve the correct classification rate of an RCA? | Many research groups are in transition from STR to SNP markers. More markers (if unlinked) are likely more informative and thus provide higher correct classification rates in RCAs; this should also be true for a combination of SNP and STR markers. | A combination of SNP and 20 STR markers improved the results of an RCA when few markers were available or it decreased the required number of SNPs to achieve >95% (or >80%) correct classification rate, respectively ( |
| 9 | How large is the effect of typing error (due to mutations, allelic dropout, erroneous scoring) on the correct classification rate of an RCA? | Typing errors decrease the chance that a dyads is assigned to the correct category because the dyad’s expected and observed allele sharing for the correct category may differ ( | A 2% typing error decreased the correct classification rate of an RCA thus increasing the number of loci required for 95% correct classification rate for most categories ( |
| 10 | In populations with non-overlapping generations, which relatedness categories can be assigned with >95% correct classification rate? | Trans-generational dyads do not coexist in populations with non-overlapping generations. This changes the expected proportions of observed pedigree dyads and may thus impact on the correct classification rates of an RCA, by changing the proportion of false positives/true positives. | If generations do not overlap, sampling during a single time-period could not include certain relatedness categories. This leads to fewer pedigree categories being assessed correctly, |
| 11 | What effect does incorporating additional data, such as individual sex, age or mitochondrial DNA (mtDNA) haplotype, have on the correct classification rate of an RCA? | Some false-positive results ( | Age and mtDNA haplotype data increased RCA correct classification rates. For example, the mean correct classification rate in a monogamous system increased from 0.729 (genetic data only) to 0.899 (genetic, age, and mtDNA data) for PO and from 0.414 to 0.699 for FS based on 20 STRs ( |
| Age data had a more positive effect on RCA correct classification rates than mtDNA data for the category PO, and mtDNA had a more positive effect than age for the category FS ( |
RCA, relatedness category assignment; SNP, single-nucleotide polymorphism; STR, single-tandem repeat; PO, parent−offspring; FS, full sibs; R = 0.25 half sibs, grandparent-grandchild, avuncular; R = 0.125 first cousins; IBD, identity by descent; MAF, minor allele frequency; R = 0.0625 half first cousins, first cousins once removed, double second cousins.
Figure 1Promiscuity: correct classification rate of relatedness category assignment (RCA) in a promiscuous population (average over 10 simulations). Three different minor allele frequencies (MAF) for single-nucleotide polymorphisms (SNPs), seven different numbers of SNP loci (individual bars from left to right: 50, 100, 200, 400, 800, 1600, 3200), four different numbers of STR loci (from left to right: 10, 20, 40, 80), and a combination of SNP with 20 STR loci were simulated. On the left vertical axes, the proportion of the correct pedigree relatedness color in each category (PO: parent-offspring; FS: full sibs; unrel: unrelated) indicates the correct classification rate of the category-assignment based on the genetic loci. Other colors indicate source of erroneously assigned categories. The right vertical axes, and the lines in the subplots, indicate the number (No) of dyads that were assigned to each category (the true number of dyads can be inferred where almost 100% correct classification rates were achieved). The orders of magnitude at the top of the No dyads/category scale of the first row apply to all No dyads/category scales below it. Figure S1 and Figure S2 show the same plots for other mating systems. The variability between the 10 independent simulations is presented in Table S2.
Figure 2Effect of additional data on correct classification rate of relatedness category assignment in a monogamous population using 20 STRs. In addition to age and/or mtDNA haplotype, the sex of the individuals was known. Plotted are mean and range of the correct classification rate based on 10 independent simulations.
Minimum number of SNP and/or STR loci required per category for a relatedness category assignment with >95% (>80%) correct classification rates
| Mating System | Marker | MAF | PO | FS | R = 0.25 | R = 0.125 | Unrel |
|---|---|---|---|---|---|---|---|
| Monogamy | SNP | 0.05 | 3200 (800) | 1600 (800) | 3200 (1600) | − (−) | 50 (50) |
| 0.25 | 200 (100) | 200 (200) | 1600 (800) | − (−) | 50 (50) | ||
| 0.5 | 100 (100) | 200 (100) | 1600 (400) | − (−) | 50 (50) | ||
| STR | n/a | 80 (40) | 80 (40) | − (–) | − (−) | 10 (10) | |
| SNP and STR | 0.05 | 800 (100) | 800 (100) | 3200 (800) | − (−) | 50 (50) | |
| 0.25 | 100 (50) | 200 (50) | 1600 (400) | − (−) | 50 (50) | ||
| 0.5 | 100 (50) | 200 (50) | 1600 (400) | − (−) | 50 (50) | ||
| Polygyny | SNP | 0.05 | 1600 (800) | 3200 (1600) | 3200 (800) | − (−) | 50 (50) |
| 0.25 | 200 (100) | 800 (400) | 800 (400) | − (−) | 50 (50) | ||
| 0.5 | 100 (100) | 400 (200) | 800 (400) | − (−) | 50 (50) | ||
| STR | n/a | 40 (40) | − (80) | − (–) | − (−) | 10 (10) | |
| SNP and STR | 0.05 | 400 (100) | 1600 (800) | 1600 (800) | − (−) | 50 (50) | |
| 0.25 | 100 (50) | 800 (200) | 800 (400) | − (−) | 50 (50) | ||
| 0.5 | 50 (50) | 400 (200) | 800 (400) | − (−) | 50 (50) | ||
| Promiscuity | SNP | 0.05 | 800 (400) | − (−) | 3200 (800) | − (−) | 50 (50) |
| 0.25 | 200 (100) | − (−) | 1600 (400) | − (−) | 50 (50) | ||
| 0.5 | 100 (100) | − (−) | 1600 (400) | − (−) | 50 (50) | ||
| STR | n/a | 40 (40) | − (−) | − (–) | − (−) | 10 (10) | |
| SNP and STR | 0.05 | 200 (50) | − (−) | 3200 (800) | − (−) | 50 (50) | |
| 0.25 | 50 (50) | − (−) | 1600 (400) | − (−) | 50 (50) | ||
| 0.5 | 50 (50) | − (−) | 800 (400) | − (−) | 50 (50) |
Dashes indicate that the category could not be assigned with a >95% (80%) correct classification rate with the simulated number of loci. SNP, single-nucleotide polymorphism; STR, short-tandem repeat; MAF, minor allele frequency; PO, parent-offspring; FS, full sibs; R = 0.25, avuncular, half sibs, grand-parent-grand offspring; R = 0.125, full cousins, half avuncular.
Even though no number of tested loci led to a 95% (80%) correct classification rate for the R = 0.125 category under the simulated population conditions, R = 0.125 is part of this table because it is important to include it in the relatedness category assignment for the correct classification rates of R = 0.25.
Note that with the population size and parameters used, more than >95% of individuals are unrelated (Unrel), so even if all dyads were assigned to the category ‘unrelated’ the correct classification rates might be >95%.
Number of SNP loci required when combined with 20 STR loci.