Literature DB >> 27648229

Strategies for determining kinship in wild populations using genetic data.

Abstract

Knowledge of kin relationships between members of wild animal populations has broad application in ecology and evolution research by allowing the investigation of dispersal dynamics, mating systems, inbreeding avoidance, kin recognition, and kin selection as well as aiding the management of endangered populations. However, the assessment of kinship among members of wild animal populations is difficult in the absence of detailed multigenerational pedigrees. Here, we first review the distinction between genetic relatedness and kinship derived from pedigrees and how this makes the identification of kin using genetic data inherently challenging. We then describe useful approaches to kinship classification, such as parentage analysis and sibship reconstruction, and explain how the combined use of marker systems with biparental and uniparental inheritance, demographic information, likelihood analyses, relatedness coefficients, and estimation of misclassification rates can yield reliable classifications of kinship in groups with complex kin structures. We outline alternative approaches for cases in which explicit knowledge of dyadic kinship is not necessary, but indirect inferences about kinship on a group- or population-wide scale suffice, such as whether more highly related dyads are in closer spatial proximity. Although analysis of highly variable microsatellite loci is still the dominant approach for studies on wild populations, we describe how the long-awaited use of large-scale single-nucleotide polymorphism and sequencing data derived from noninvasive low-quality samples may eventually lead to highly accurate assessments of varying degrees of kinship in wild populations.

Entities: Chemical Disease Gene Species

Keywords: Genetic relatedness; microsatellites; next‐generation sequencing; parentage analysis; sibship reconstruction; single‐nucleotide polymorphisms

Year: 2016 PMID： 27648229 PMCID： PMC5016635 DOI： 10.1002/ece3.2346

Source DB: PubMed Journal: Ecol Evol ISSN： 2045-7758 Impact factor: 2.912

Why Determine Kinship in Wild Animal Populations?

In many social species, members of one sex disperse, while members of the philopatric sex live in close proximity to kin and nonkin. Distinguishing between close relatives and unrelated conspecifics allows individuals to obtain direct or inclusive fitness benefits by biasing affiliative or coalitionary behaviors toward relatives while avoiding inbreeding and competition with relatives (Hamilton 1964). For example, recent studies of wild populations have demonstrated an effect of kinship on allonursing in cooperative breeders (MacLeod et al. 2013), identified kin biases in association (Bercovitch and Berry 2013) and affiliation (Widdig et al. 2016), and shown parallel dispersal of kin (Wikberg et al. 2014) and inbreeding avoidance (Sanderson et al. 2015). Association and close social relationships among relatives may provide adaptive benefits by improved reproductive success through increased longevity or offspring survival (König 1994; Viblanc et al. 2010). Beyond dyadic social relationships, knowledge of the population‐wide distribution of pairs of kin and nonkin can be used to identify dispersal patterns (Van Noordwijk et al. 2012) or reproductive skew (Vigilant et al. 2015). Several studies have used kinship analyses to characterize mating systems of wild populations as monogamous (Huck et al. 2014), polyandrous (Barth et al. 2014), or polygynous (Muralidhar et al. 2014), identified extra‐pair parentage in socially monogamous species (Barelli et al. 2013) or cases of adoption and cuckoldry (Stiver et al. 2012). Moreover, at the population level, even members of solitary species may derive benefits from kin biases by avoiding inbreeding (Metzger et al. 2010) or competition among relatives (Lizé et al. 2006) or by occupying territories next to kin which may lead to reduced aggression (Bradley et al. 2004). Furthermore, dyadic kinship information can be used to estimate the heritability of traits (Dubuc et al. 2014). From a practical perspective, kinship analyses can be applied to inform conservation efforts such as the breeding and stocking management of endangered fish populations (O'Reilly and Kozfkay 2014). Studies of kinship in the wild are often preferable over studies in captivity because aspects such as the kinship structure in the population, dispersal, mating as well as kin‐biased behavior may be strongly altered under captive conditions. However, analysis of kinship patterns can be challenging in wild populations. Maintenance of long‐term field sites with individually identified animals and reconstructed multigenerational pedigrees is challenging (Clutton‐Brock and Sheldon 2010). A long‐standing goal in kinship studies has therefore been to assess kin relationships by the use of genetic analysis, which has typically employed microsatellite genotype analysis of DNA derived from noninvasive samples.

Genes or Genealogy?

Comparing genotypes of different individuals and classifying them into kinship categories such as “sister” or “cousin” are difficult. This is because genetic relatedness is a continuous parameter determined by the proportion of the genome shared between two individuals by descent from a common ancestor and, particularly if inferred from a limited number of markers, does not necessarily correspond to theoretical expectations based on the categorical pedigree relationship for a given dyad (Blouin 2003). The segregation of pairs of chromosomes during the first meiotic cell division as well as chromosomal recombination is stochastic processes leading to large variation in the amount of the genome that is identical by descent between two relatives, with the exception of parent–offspring and monozygotic twins (Fig. 1; Rasmuson 1993). For example, while full siblings share on average 50% of their genome, some may share considerably less or more (e.g., Visscher et al. 2006; Fig. 1) the variance being dependent on the number of chromosomes and their crossover rates (Hill and Weir 2011). Therefore, although they are generally strongly correlated, pedigree relatedness or kinship and genetic relatedness or realized relatedness is conceptually and often empirically different.

Figure 1

Schematic of four chromosome pairs, showing the parental origins of segments of the genome shared identical by descent (IBD) by a pair of full siblings. Green segments of the genome are passed on to each offspring by the mother, and blue segments are passed on to each offspring by the father. Due to crossover events, parts of either chromosome can be passed onto the offspring. Pedigree estimates of relatedness may be inaccurate because they require the assumption that founders are outbred and unrelated. In combination with the increased ability to accurately determine realized relatedness, this has led many to question the usefulness of relatedness derived from pedigrees, particularly in the context of heritability and inbreeding (Gay et al. 2013; Speed and Balding 2014; Kardos et al. 2015; Wang 2015). Recent human studies even use the actual genetic similarity of large numbers of unrelated individuals, instead of close relatives, to estimate heritability and predict phenotypes. This use of unrelated individuals reduces the variance in inferred heritability among dyads and increases the possibility of pinpointing the heritability of a trait to specific genomic regions (Speed and Balding 2014). In theory, the same principal could be used to investigate kin recognition in species which recognize kin via phenotype matching by correlating genetic similarity with biases in behavior as well as to identify the genetic regions involved. The more genetically similar two individuals are, the more likely they are to share alleles for the genes involved in kin recognition by phenotype matching. According to Hamilton's (1964) rule, we would thus predict that individuals prefer more genetically similar individuals independent of their categorical kinship. Yellow baboons, for example, likely recognize paternal kin via a combination of social familiarity and phenotype matching (Smith et al. 2003), but strong social bonds also exist among unrelated individuals (Silk et al. 2006). One could thus hypothesize that preferred unrelated social partners are chosen based upon genetic similarity due to “misdirected” kin recognition by phenotype matching. For yellow baboons, other factors, such as rank or age similarity, almost certainly have larger effects than genetic similarity in influencing the choice of social partners among unrelated individuals, so that thousands of individuals might be necessary to obtain sufficient power to detect any effects of genetic similarity on kin biases (Silk et al. 2006; Visscher et al. 2014). In an experimental design, juveniles of Atlantic salmon and brook trout preferred kin with whom they shared both alleles for an MHC class II gene to kin with whom they shared no alleles and preferred nonkin sharing both alleles to nonkin sharing no alleles (Rajakaruna et al. 2006). In this study, the influence of a candidate gene on kin recognition was investigated. Generally however, if a limited number of markers are used to determine genetic relatedness, and these markers are not by chance linked to genetic regions involved in kin recognition, pedigree relatedness should more accurately represent the genome‐wide sharing of alleles than genetic relatedness and may then be the more accurate predictor of kin bias. Analogous comparisons of marker‐ and pedigree‐based heritability estimates show that thousands of single‐nucleotide polymorphisms (SNPs) are necessary to estimate heritability with the same accuracy as when using pedigree relatedness (Gay et al. 2013; Bérénos et al. 2014). Such numbers of SNPs are still unavailable for most studies of nonmodel organisms which particularly for wild populations often rely upon poor‐quality DNA derived from noninvasive samples and consequently employ analyses of relatively small numbers of microsatellite loci (Box 1).

Genetic marker systems and noninvasive sampling.

Studies of wild animals typically rely upon noninvasive samples such as hair (Morin & Woodruff 1992), blow (Frère et al. 2010), food wadges (Hashimoto et al. 1996), feathers and egg membranes (Pearce et al. 1997), shed skin (Villarreal et al. 1996), urine (Hayakawa & Takenaka 1999), or fecal samples (Höss et al. 1992). Although the DNA extracted from these samples is usually degraded and contains low proportions of endogenous DNA, accurate microsatellite genotypes can be obtained if extensive replication is performed (Taberlet & Luikart 1999). First described in the 1980s, microsatellites (STRs) are tandem repeats of short sequences and have long been the most common markers used in studies of wild populations (Fig. B1) with single‐nucleotide polymorphisms (SNPs), single base‐pair differences between the genomes of two individuals of a species, and next‐generation sequencing being less commonly used. The advantages and disadvantages of microsatellites and SNPs for population genetic applications in general have been extensively reviewed (Morin et al. 2004; Guichoux et al. 2011).

Figure B1

ISI Web of Knowledge cumulative search results per year Search words: “wild population” and “microsatellite” (STR) or “SNP” (single‐nucleotide polymorphism) or “next‐generation sequencing” (NGS), excluding “plant.”

Advantages of STRs in kinship analyses

Highly polymorphic. High cross‐species amplification success (e.g., Buschiazzo & Gemmell 2010). Sibship reconstruction possible according to the 4‐ and 2‐allele property (Berger‐Wolf et al. 2007). Generally high power for kinship analyses; ˜6× the power of SNPs (Fig. B2).

Figure B2

Relationship between the number of microsatellites (STRs) and the number of single‐nucleotide polymorphisms (SNPs) when both marker systems perform equally well in kinship analyses. Data are simulated data or empirical population data (Table S1).

Advantages of SNPs in kinship analyses

Biallelic: Few genotypes necessary to accurately estimate allele frequencies. Lower and predictable mutation rates (Ellegren 2004). Shorter fragments amplified: Greater amplification success from degraded DNA (Campbell & Narum 2009). As many loci have to be typed, the resulting genotypes may be more representative of the entire genome. Software for the analysis of genetic marker data is increasingly developed for SNP data only. In contrast to phenotype matching, kin recognition mediated by familiarity or contextual cues is independent of genetic relatedness, but dependent on pedigree relationships and the resulting spatial and temporal association of individuals. These include a close association between mother and offspring in species with maternal care, close association of littermates at a young age which usually are maternal or full siblings, or age proximity which could be used as a cue for paternal relatedness in species for which male reproductive skew leads to cohorts of paternal siblings (Widdig 2013). Cross‐fostering experiments have shown that individuals bias their behavior toward familiar nonkin over unfamiliar kin (reviewed in: Mateo and Holmes 2004). In most mammals, individuals may recognize their mothers, but may not bias their behavior toward other individuals having the same degree of genetic relatedness (full siblings, father–offspring), and thus, the pedigree relationship, and not the degree of genetic relatedness, is informative with regard to kin bias. For example, chimpanzee males bias affiliative and cooperative behaviors toward maternal, but not paternal brothers despite a nominal relatedness coefficient of 0.25 for both kinds of relatives (Langergraber et al. 2007). As genetic similarity, particularly when determined from a limited set of genetic markers, does not distinguish among these different kinds of kin and variance for even the same type of kin is high, the indiscriminate inclusion of the type of kin that cannot be recognized will lower the correlation of genetic relatedness and kin bias. Therefore, if individuals recognize kin through kinship‐correlated familiarity or contextual cues, pedigree kinship and not genetic relatedness will be the best predictor of kin bias, suggesting that even in the genomic era, knowledge of pedigree relationships can be useful. As detailed in the following sections, even the small sets of genetic markers typically available for studies of wild populations can be used to make inferences on kin relationships.

Assessing Parentage

Parent–offspring relationships can be determined with higher confidence than other relationships because, with the exception of instances of germline mutations or genotyping error, the parent and the offspring must share at least one allele at every locus. In many wild species, parental care, typically by the mother, easily identifies one likely parent. Direct comparison of mother, offspring, and potential father genotypes, if sufficiently variable, may directly reveal parentage relationships if all candidate parents were perfectly sampled. However, analysis in a statistical framework that allows for the consideration of error rates, proportion of candidate parents sampled, and other factors can aid in assessing the confidence of the assignments (e.g., CERVUS (Marshall et al. 1998)), FRANz (Riester et al. 2009), KINGROUP (Konovalov et al. 2004)). For example, testing for parentage in a likelihood framework assesses the significance of the likelihood ratio of a dyad, that is, the likelihood that the dyad has a certain relationship given its patterns of allele sharing (e.g., parent–offspring) over the likelihood that the dyad has alternative relationships (e.g., unrelated) (Fig. 2).

Figure 2

Proportion of kin (mis)classified as parent–offspring in likelihood analyses of parentage. The likelihood ratio value is the likelihood (L) of the alternative hypothesis of parent–offspring relationship over the likelihood of a null hypothesis of (A) no relatedness or (B) a complex null hypothesis simultaneously considering full siblingship, half siblingship, and no relatedness. Even when using conservative P‐value, misclassifications occur (A) while testing a complex null hypothesis (B) reduces the number of misclassifications of other kin categories as parent–offspring but more than halves the proportion of true classifications. Sets of 1000 dyads per kinship category (po: parent–offspring, fs: full siblings, hs: half siblings, ur: unrelated) were generated in KINGROUP v2 (Konovalov et al. 2004) using ten loci with five equifrequent alleles per locus. Likelihood analyses were conducted in KINGROUP v2. P‐values were generated through 1,000,000 permutations. Parentage analysis becomes markedly more challenging in situations where neither parent is known by observation. Essentially, the same principal of shared alleles and exclusion can be applied, but the assignment becomes much more complicated as the identities of the maternal and paternal alleles in the offspring are unknown. Several different approaches have been devised to assign parentage if few or no parent–offspring relationships are known or several sires cannot be excluded (Jones et al. 2010; Harrison et al. 2013). Generally, assignment error increases with an increasing number of candidate parents, but decreases the greater the proportion of candidate parents sampled (Marshall et al. 1998; Harrison et al. 2013). Assignment error also depends on the presence of other categories of kin in the sample. This is because a nonparent relative of either the offspring or a true parent, particularly one related to the offspring or parent at a level of 0.25 or higher, is likely to be misclassified as a parent (Thompson and Meagher 1987; Marshall et al. 1998; Olsen et al. 2001; Fig. 2). Thus, despite the unique genetic relationship between parent and offspring, false‐positive and false‐negative assignments are to be expected in parentage analyses. Given that unrelated individuals are highly unlikely to be classified as parent–offspring (Fig. 2) and that knowledge of parentage is likely used to assess a behavioral or ecological hypothesis, it might be acceptable that some putative parent–offspring dyads are actually not parent–offspring, but otherwise closely related. Yet, in long‐term studies, for which one parent is known by observation and the other genetically assigned, assignment error will be extremely low and continued parentage analysis can identify maternal and paternal kinship over generations and thus be used to reconstruct increasingly deep pedigrees (e.g., Van Horn et al. 2008).

Sibship Reconstruction

In species for which the population can be expected to mainly contain groups of full and/or half siblings, sibship reconstruction is a powerful tool for identifying the related individuals. This approach is more accurate than evaluating dyads because it considers the relationships among all genotypes simultaneously (Wang and Santure 2009). The success of sibship reconstruction generally improves with increases in the number of individuals per full‐ or half‐sib family, although full sibship may be determined with high accuracy for sibling groups as small as four (Wang and Santure 2009), but may decrease with an increasing number of families (Thomas and Hill 2002; Sheikh et al. 2008; Almudevar and Anderson 2012; Wang 2012). For example, with just four highly variable loci, 12 full‐sibling families of 760 Atlantic salmon could be accurately partitioned (Almudevar and Anderson 2012; Wang 2012). Such analyses are extremely accurate, but become less successful if dyads with a lower degree of relatedness are included; for example, the inclusion of cousins reduces the power and accuracy of the analysis (Thomas and Hill 2002; Wang 2004). Analysis of populations with complex kinship structures, such as may arise when both sexes are polygamous, can lead to prohibitively long run times and nonconvergence (Wang and Santure 2009; Wang 2012; Dexter and Brown 2013). Such complexity, including the coresidence of different categories of close and distant relatives, may be present in social groups featuring promiscuous mating systems, small litter sizes, long life spans, overlapping generations, or immigration. Hence, approaches identifying different types of kin in groups with complex kin compositions are needed.

Identifying the Other Types of Kin

One seemingly straightforward approach to determining the kin relationship of any dyad relies on the use of dyadic relatedness estimators, which gauge the amount of genetic material shared by descent between individuals. The accuracy and precision of these estimators depend on the number of markers typed, their polymorphism, allele frequency distribution, and the kin structure of the population (Milligan 2003; Csilléry et al. 2006; Konovalov and Heg 2008; Van Horn et al. 2008). Although the relatedness coefficient averaged over many dyads usually corresponds well to the expected pedigree relatedness, the previously discussed inherent difference between genetic and pedigree relatedness leads to overlapping distributions of the relatedness coefficient for different kinship categories (Blouin et al. 1996; Fig. 3A). This is in principal independent of methodological inaccuracies in genetic relatedness estimates due to the usage of a limited number of genetic markers, variance in the sharing of alleles by state, or inaccurate measures of the population's allele frequencies. Consequently, the relatedness coefficient for any dyad is an imperfect measure of that dyad's pedigree kinship, and correlations between pedigree and genetic relatedness will be imperfect. This thus holds true even if very large numbers of markers are used. For example, a study in zebra finches found a maximum correlation of 0.86 between the genetic relatedness determined with the full dataset of 771 SNPs and the pedigree relatedness of a multigenerational zebra finch pedigree. Because linkage among loci increases the variance of the estimate (Glaubitz et al. 2003; Santure et al. 2010), even a study using more than 9000 SNPs to assess the correlation between genetic and pedigree relatedness in pigs found a correlation of just 0.85 (Lopes et al. 2013).

Figure 3

Misclassification and true classification rates per kinship category. Dyads of unrelated individuals (black), half siblings (green), and full siblings (blue) genotyped at 23 autosomal microsatellite loci were simulated in KINGROUP v2 using allele frequencies of a band of hamadryas baboons (Band 1, Städele et al. 2015). Tables indicate the percentage of true classifications (bold), misclassifications, and unclassified dyads. The top row indicates the actual relationship. Truly unrelated dyads were combined with unclassified dyads, but could also be classified by defining cutoff values. (A) Cutoff values for the relatedness coefficient (edges of the shaded areas) can be modified to achieve low misclassification rates leading to low rates of true classifications (bold). Shaded areas indicate the proportion of dyads classified as half (green) or full siblings (blue). (B) Cutoff values for relatedness coefficient and two likelihood ratios applied simultaneously (lines); log‐likelihood ratio 1 (LOD 1) hypothesis: half siblings, null hypothesis: parent–offspring, full siblings, unrelated; log‐likelihood ratio 2 (LOD 2) hypothesis: full siblings, null hypothesis: parent–offspring, half siblings, unrelated. This approach leads to similarly low misclassification rates as A) but allows for more true classifications. Despite the lack of a perfect correlation of genetic relatedness with pedigree relatedness, some approaches exist to identify dyads which can be classified with confidence. For example, Blouin et al. (1996) suggested first using simulated distributions of the relatedness coefficient for certain kinship categories and then defining cutoff values to classify dyads as belonging to these kinship categories while determining consequent misclassification rates from the simulated distributions. How the cutoff values are chosen determines the misclassification rate. It is possible to thus push the misclassification rates for certain kinship categories under a desired threshold (e.g., 5%, although much lower rates might be desirable if many dyads are evaluated) by choosing narrow cutoff values (shaded areas in Fig. 3A). However, this approach leads to low rates of true classifications, that is, dyads of a certain kinship category which are correctly classified as belonging to that category, and many dyads remain unclassified because their values fall between the cutoffs (Fig. 3A). By combining cutoffs for likelihood ratios, often expressed as the logarithm of the likelihood ratio (log odds ratios, LOD) and testing different hypotheses about the kinship status of a dyad, it may be possible to improve the resolution of such an analysis (Thompson and Meagher 1987). Additional power can be added by combining cutoffs for likelihood ratios with cutoffs for the relatedness coefficient (Langergraber et al. 2007; Städele et al. 2016; Fig. 3B). Although relatedness coefficients and likelihood ratios are strongly correlated because they are derived from the same autosomal data, they are sufficiently different so that the resolution of classifications of dyadic kinship is improved by combining them. This leads to low misclassification rates combined with high true classification rates (Fig. 3B). Cutoffs can be defined by systematically testing values which maximize the true classification rate and minimize the misclassification rate or can be empirically defined using data from dyads of known pedigree relationship or from relatives identified through pedigrees reconstructed after parentage analysis (Langergraber et al. 2007; Städele et al. 2016). Using such an approach, low misclassification rates can be achieved with a relatively small set of loci. It is important to note that even when using this combined approach, if low misclassification rates are prioritized, a large number of dyads will be unclassified because their parameter values fall between the cutoff values for different kinship categories (Fig. 3B). A different approach to identifying dyadic kinship which also accepts unclassified dyads as a trade‐off for low misclassification rates is based on calculating P‐values associated with likelihood ratios and then selecting a subset of dyads by applying the false discovery rate procedure (Benjamini and Hochberg 1995) which controls for an expected proportion of type I errors (Skaug et al. 2010). The subset of dyads is then genotyped at a second set of loci, and dyads are accepted as having the hypothesized kin relationship according to a nominal level of P‐value associated with the new likelihood ratio value. These approaches could theoretically be used to classify dyads beyond the second degree of kinship; however, the overlap of the distributions of measures of genetic relatedness increases with a decreasing degree of kinship, and eventually, no satisfactory trade‐off between misclassifications and correct classifications can be reached. In sum, by accepting the limitations of certain rates of misclassification as well as the inability to classify every single dyad, it is possible to infer dyadic kinship up to the second degree for many pairs of individuals living in populations with complex kin compositions, even when using the limited numbers of autosomal markers typically available for wild populations.

The Value of Nonautosomal Marker Data and Demographic Information

The inheritance patterns of the maternally inherited mitochondrial DNA (mtDNA), paternally inherited Y‐chromosome, and bi‐parentally inherited X‐chromosome make them powerful additions to kinship analyses using autosomal data and by identifying false‐positive assignments of kinship as well as reducing the misclassification rates (Kopps et al. 2015). Of the nonautosomal markers, mtDNA is commonly used in studies of wild populations due to a high degree of sequence identity of many of its segments among vertebrates making it an easy target for cross‐species amplification, as well as the presence of high copy numbers leading to usually good amplification from low‐quality samples. X‐ and Y‐linked loci have been less widely used and may have to be identified de novo for many species, but these can in principal also be genotyped using cross‐species amplification, although low levels of Y‐chromosomal variation can make it difficult to identify polymorphic Y‐linked markers (Ellegren 2003; Greminger et al. 2010). Fathers and sons have to share Y‐haplotypes, and mothers and offspring have to share mtDNA haplotypes. Fathers and daughters as well as mothers and offspring have to share at least one allele at every locus of the X‐chromosome. Thus, simple comparisons can reveal misclassifications, and the more diverse the marker, the more likely it is that misclassifications are identified because dyads are less likely to share alleles/haplotypes by chance. In a study of hamadryas baboons, we found that misclassification rates can be greatly reduced even using just one Y‐linked microsatellite locus or four X‐linked microsatellite loci with low levels of polymorphism (Fig. 4; Band 1, Städele et al. 2015). In addition to identifying misclassifications, the inclusion of nonautosomal marker data can further improve the resolution of kinship analyses. For example, likelihood ratio values can be calculated for X‐chromosomal genotypes, thus making it for example possible to define cutoffs distinguishing between maternal and paternal siblings if a sufficient number of loci are available (Langergraber et al. 2007). As a further incentive to overcome the challenges of characterization in novel species, it is worth noting that uniparentally inherited markers may also aid in analyses of population structure due to their smaller effective population sizes and help reveal sex‐specific population histories. While mtDNA is maternally inherited for most animals, paternal inheritance of the Y‐chromosome is the norm only in mammals (Sato and Sato 2013). Yet, genetic sex determination via an XY system is found in many fish (Devlin and Nagahama 2002), insects (Sanchez 2008), and reptiles (Modi and Crews 2005). In species with a ZW sex determination system, for which females are the heterogametic sex, shared Z‐chromosomes can facilitate the identification of maternally related male dyads, while shared W‐chromosomes can facilitate the identification of maternally related females. However, species with environmental sex determination and species with a ZW sex determination system lack exclusively paternally inherited markers for the identification of paternal relatives.

Figure 4

Misclassification rates are reduced when autosomal data are supplemented by other information. Reduction in misclassification rates for 1000 randomized sets of 1000 dyads using the mtDNA and Y‐haplotype frequencies and X‐linked microsatellite allele frequencies of a social group of hamadryas baboons, Papio hamadryas (Band 1, Städele et al. 2015). Vertical lines indicate the range, horizontal lines indicate the standard deviation, and circles show the average. Simulations were performed with one Y‐linked microsatellites locus, 13 mtDNA haplotypes, and four X‐linked microsatellites loci (XX). For age, it was assumed for simplicity that two‐thirds of the individuals belonged to one generation and one‐third to another generation and that two individuals of the same generation could not have the supposed relationship. In addition to genetic markers, information about age, social status, or group membership may be helpful for identifying false‐positive kinship assignments, for example, by identifying dyads which cannot have the purported relationship due to their relative ages (Kopps et al. 2015; Weinman et al. 2015; Fig. 4). Depending on the life history of the species, age may also be useful to identify the type of a degree of kinship, for example, whether a dyad of second‐degree relatives are grandparent–grandoffspring, half siblings, full avuncular relatives, or double first cousins, which cannot be identified from a small set of autosomal markers. For example, half siblings may be discriminated from grandparent–grandoffspring by the age difference between the individuals in species in which the reproductive life span is shorter than roughly twice the age at first reproduction as, for example, in Hector's dolphins or rock hyraxes (Pacifici et al. 2013). Programs, such as FRANz, allow the use of information about known parentage, sex, birth/beginning of group membership, death/end of group membership, and age at first birth into parentage analyses (Riester et al. 2009).

Alternatives to Determining Pedigree Kinship

Although it is possible to reliably determine the dyadic kin relationship for some proportions of dyads in a population using a range of different markers, judiciously estimating misclassification rates, and employing demographic data when available, determining dyadic kinship for members of wild populations is challenging. For the investigation of hypotheses which do not require explicit knowledge of kinship, we therefore advocate the use of methods which provide more general and indirect inferences about kinship in a group or population without the necessity of determining the kin relationships of single dyads.

Who is definitely NOT closely related?

In many studies, researchers seek to determine kinship among individuals to control for possible kin biases while studying other factors that potentially influence social relationships or grouping patterns. In these cases, it might often be sufficient to identify dyads that cannot possibly be close kin and then limit analyses to these dyads. The same is true for breeding programs that aim to identify unrelated dyads among potential wild‐caught founders. Individuals not sharing an mtDNA haplotype cannot be close maternal relatives, individuals not sharing at least one allele at every X‐chromosomal locus cannot be paternal sisters or father–daughter, and males not sharing a Y‐chromosomal haplotype cannot be close paternal relatives. A considerable advantage of this exclusion approach is that one polymorphic marker of each category is sufficient to exclude these categories of kinship, although a larger number and greater variability will lead to the exclusion of more dyads.

Correlating dyadic relatedness coefficients and other variables

Average dyadic relatedness coefficients can be compared among groups to, for example, make inferences about philopatry and dispersal by comparing average relatedness within groups to average relatedness among groups or average within‐group relatedness values for male and female dyads (Janečka et al. 2007; Li and Merilä 2010; Städele et al. 2015). However, it is often interesting to know whether a certain variable is dependent on relatedness, such as whether more closely related individuals are in closer spatial proximity, have more similar phenotypes, or more often show certain dyadic behaviors. In studies of wild populations for which experimental hypothesis testing is usually not possible, correlational hypothesis testing is used to infer causal relationships. The Mantel test, which assesses the correlation between two distance matrices, has been used to test for nonrandom correlations between the dyadic relatedness coefficient and other variables, such as spatial distance, acoustic similarity (Lemasson et al. 2011), similarity of odor profiles (Boulet et al. 2009), or association strength and mtDNA sharing (Wiszniewski et al. 2010). Extensions of the simple Mantel test (Mantel 1967), the partial Mantel test (Manly 1986), or multiple regressions on distance matrices (Smouse et al. 1986) allow for testing the correlation between two variables while controlling for one or more other variables. Although Mantel tests have been widely used in ecological studies to assess the relationship between geographic and genetic distance, their statistical appropriateness has been critically discussed, and particularly, the extensions of the simple Mantel test have been criticized in terms of low power and inflated type I error rates (Guillot and Rousset 2013; Legendre et al. 2015). Generally, it seems that simple Mantel tests reveal unbiased results if the assumptions of the test are met, including linearity, homoscedasticity, and a lack of autocorrelation of the permuted variable (Guillot and Rousset 2013). However, the limitations of the Mantel test make it useful only for the investigation of simple hypotheses. Generalized linear models allow the assessment of more complex hypotheses by testing the influence of several continuous or categorical variables on a response variable including interactions or non‐normal error structures, for example, whether there is an influence of genetic relatedness on spatial proximity and whether this is different for male and female dyads. Data collected from wild populations will rarely correspond to the standard experimental designs in which each individual is part of only one dyad, but instead some or all individuals are part of many dyads. To control for the nonindependence of the data introduced by these multiple “observations”, random effects can be included in (generalized) linear‐mixed models. For dyadic data, this is not always trivial because the individuals within a dyad are often indistinguishable in that they cannot be placed in a meaningful order, such as male and female or aggressor and aggressed (Kenny et al. 2006), and so the assignment of each individual to one of the two random effects for a dyad is arbitrary. A possible solution to this is to repeatedly randomize this assignment and report model results averaged over the randomizations (e.g., Van Leeuwen et al. 2012). It is important to note that the sample size and the variance in the relatedness composition of the population determine the power of any test correlating dyadic relatedness coefficients with other variables. In particular, wild populations with complex kinship structures may contain a very low proportion of highly related dyads (Csilléry et al. 2006; Van Horn et al. 2008). Furthermore, kin‐biased behaviors might only be expressed toward one type of kin but not another, even though the two kin categories have the same mean genetic relatedness (e.g., parent–offspring and full siblings). Researchers should be aware of this reduced power when interpreting nonsignificant results.

Future Directions and Challenges

Work in humans aimed at identifying relationships in human pedigree data illustrates the potential of large‐scale data from SNP arrays or whole‐genome sequencing. For example, relatives up to the third degree can be identified with extremely low rates of misclassifications using 500k SNPs (Manichaikul et al. 2010). Relationships can be detected up to the fifth degree with high accuracy using thousands of unlinked SNPs (Kling et al. 2012), and incorporation of linkage information among SNPs may distinguish between different relationships possessing the same degree of relatedness (Kyriazopoulou‐Panagiotopoulou et al. 2011). Stretches of sequence identical by descent inferred from whole‐genome sequence data may resolve relationships up to the fifth degree (Huff et al. 2011; Li et al. 2014). Thus, even distantly related dyads can be accurately classified if a large number of markers, linkage information, or whole‐genome sequence data can be attained. Programs for pedigree reconstruction should eliminate the need for dyadic relationship classification while at the same time clearly defining the type and not just the degree of the relationship. However, these programs currently do not accommodate complex social systems with nonmonogamy, ungenotyped individuals central to the pedigree, assume sampled individuals are in the same generation, or require large‐scale SNP or whole‐genome sequencing data (Riester et al. 2009; Kirkpatrick et al. 2011; Cussens et al. 2013; He and Eskin 2014; Staples et al. 2014). Only in the recent years have researchers employed large‐scale cross‐species SNP‐typing approaches, such as SNP chips, which make it possible to genotype large arrays of SNPs for species other than model organisms and commercial species (Pertoldi et al. 2010; Ogden et al. 2012; Hoffman et al. 2013). However, recovery rates for polymorphic SNPs may be low even in closely related species, as in a study in which cross‐amplification of bighorn sheep (Ovis canadensis) DNA on a 49,035 loci domestic sheep SNP array (Ovis aries), species which diverged ~2.6 Mya, yielded only 561 polymorphic SNPs (1.1%) (Miller et al. 2011). Few studies have used large SNP arrays for low‐quality DNA (Decker et al. 2009). However, efficient semi‐automated smaller‐scale approaches may yield genotypes of ~100 SNPs from low‐quality fecal or hair DNA (Kraus et al. 2015; Norman and Spong 2015). The potential of next‐generation high‐throughput sequencing to generate large amounts of sequence data from even low‐quality samples, such as ancient DNA, would suggest that large numbers of noninvasive samples could soon be efficiently genotyped or sequenced on a genomic scale. However, demonstrations of efficient low‐cost genome‐level sequencing from noninvasively obtained low‐quality samples are thus far limited in scale (Perry et al. 2010; Chiou and Bergey 2015; Snyder‐Mackler et al. 2016). One recent study used DNA from 62 baboon fecal samples to produce low‐coverage genomes (0.49×) and infer paternity for 27 offspring (Snyder‐Mackler et al. 2016). They estimated a cost of 200 USD/individual for coverage of 1x, approximately twice the amount needed to generate comparably effective microsatellite genotypes for paternity inference. Higher‐coverage genomes may be obtained through the improvements in enrichment of host DNA from fecal samples, although prescreening of DNA samples for the minority containing relatively higher proportions of host DNA remains important (Chiou and Bergey 2015). Methodologies for efficient and cost‐effective genotyping from low‐quality DNA are only starting to be developed for larger panels of SNPs and whole‐genome sequencing, and particularly for the latter, an enormous amount of know‐how and postprocessing is necessary. Therefore, microsatellites will, at least in the near future, stay the marker of choice for most studies of wild populations for which only low‐quality DNA can be obtained, and determining kinship in these populations will remain challenging.

Conflict of Interest

None declared. Table S1. Numbers of microsatellites (STRs) and single‐nucleotide polymorphisms (SNPs) with equal power for kinship analyses. Click here for additional data file.

85 in total

Review 1. Levels of polymorphism on the sex-limited chromosome: a clue to Y from W?

Authors: Hans Ellegren
Journal: Bioessays Date: 2003-02 Impact factor: 4.345

2. Performance of marker-based relatedness estimators in natural populations of outbred vertebrates.

Authors: Katalin Csilléry; Toby Johnson; Dario Beraldi; Tim Clutton-Brock; Dave Coltman; Bengt Hansson; Goran Spong; Josephine M Pemberton
Journal: Genetics Date: 2006-06-18 Impact factor: 4.562

3. Parentage and sibship inference from multilocus genotype data under polygamy.

Authors: J Wang; A W Santure
Journal: Genetics Date: 2009-02-16 Impact factor: 4.562

4. A practical guide to methods of parentage analysis.

Authors: Adam G Jones; Clayton M Small; Kimberly A Paczolt; Nicholas L Ratterman
Journal: Mol Ecol Resour Date: 2009-10-22 Impact factor: 7.090

5. The quest for Y-chromosomal markers - methodological strategies for mammalian non-model organisms.

Authors: Maja P Greminger; Michael Krützen; Claude Schelling; Aldona Pienkowska-Schelling; Peter Wandeler
Journal: Mol Ecol Resour Date: 2009-12-28 Impact factor: 7.090

6. The detection of disease clustering and a generalized regression approach.

Authors: N Mantel
Journal: Cancer Res Date: 1967-02 Impact factor: 12.701

7. Measuring individual inbreeding in the age of genomics: marker-based measures are better than pedigrees.

Authors: M Kardos; G Luikart; F W Allendorf
Journal: Heredity (Edinb) Date: 2015-03-18 Impact factor: 3.821

8. Fast half-sibling population reconstruction: theory and algorithms.

Authors: Daniel Dexter; Daniel G Brown
Journal: Algorithms Mol Biol Date: 2013-07-12 Impact factor: 1.405

9. Accurate and robust prediction of genetic relationship from whole-genome sequences.

Authors: Hong Li; Gustavo Glusman; Chad Huff; Juan Caballero; Jared C Roach
Journal: PLoS One Date: 2014-02-28 Impact factor: 3.240

10. Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples.

Authors: Noah Snyder-Mackler; William H Majoros; Michael L Yuan; Amanda O Shaver; Jacob B Gordon; Gisela H Kopp; Stephen A Schlebusch; Jeffrey D Wall; Susan C Alberts; Sayan Mukherjee; Xiang Zhou; Jenny Tung
Journal: Genetics Date: 2016-04-20 Impact factor: 4.562

13 in total

1. Kin-dependent dispersal influences relatedness and genetic structuring in a lek system.

Authors: Hugo Cayuela; Laurent Boualit; Martin Laporte; Jérôme G Prunier; Françoise Preiss; Alain Laurent; Francesco Foletti; Jean Clobert; Gwenaël Jacob
Journal: Oecologia Date: 2019-08-17 Impact factor: 3.225

2. Moment estimators of relatedness from low-depth whole-genome sequencing data.

Authors: Anthony F Herzig; M Ciullo; A-L Leutenegger; H Perdry
Journal: BMC Bioinformatics Date: 2022-06-24 Impact factor: 3.307

3. A core set of microsatellite loci for yellow-throated marten, Martes flavigula: a case of inferences of family relationships.

Authors: Seon-Mi Lee; Hea Chang Moon; Hye Sook Jeon; Eui-Geun Song; Donggul Woo; Junghwa An; Mu-Yeong Lee
Journal: Genes Genomics Date: 2019-09-20 Impact factor: 1.839

4. Mapping gastrointestinal gene expression patterns in wild primates and humans via fecal RNA-seq.

Authors: Ashok Kumar Sharma; Barbora Pafčo; Klára Vlčková; Barbora Červená; Jakub Kreisinger; Samuel Davison; Karen Beeri; Terence Fuh; Steven R Leigh; Michael B Burns; Ran Blekhman; Klára J Petrželková; Andres Gomez
Journal: BMC Genomics Date: 2019-06-14 Impact factor: 3.969

5. Comparing RADseq and microsatellites for estimating genetic diversity and relatedness - Implications for brown trout conservation.

Authors: Alexandre Lemopoulos; Jenni M Prokkola; Silva Uusi-Heikkilä; Anti Vasemägi; Ari Huusko; Pekka Hyvärinen; Marja-Liisa Koljonen; Jarmo Koskiniemi; Anssi Vainikka
Journal: Ecol Evol Date: 2019-02-06 Impact factor: 2.912

6. Pedigree reconstruction and distant pairwise relatedness estimation from genome sequence data: A demonstration in a population of rhesus macaques (Macaca mulatta).

Authors: Lauren E Petty; Kathrine Phillippi-Falkenstein; H Michael Kubisch; Muthuswamy Raveendran; R Alan Harris; Eric J Vallender; Chad D Huff; Rudolf P Bohm; Jeffrey Rogers; Jennifer E Below
Journal: Mol Ecol Resour Date: 2021-01-27 Impact factor: 7.090

7. Use of noninvasive 'bug-eggs' to enable comparative inferences on genetic mating system with and without parental information: A study in a cattle egret colony.

Authors: Carolina Isabel Miño; Elaine Dantas de Souza; Emmanuel Moralez-Silva; Talita Alvarenga Valdes; Vera Lúcia Cortiço Corrêa Rodrigues; Sílvia Nassif Del Lama
Journal: PLoS One Date: 2017-08-30 Impact factor: 3.240

8. How to estimate kinship.

Authors: Jérôme Goudet; Tomas Kay; Bruce S Weir
Journal: Mol Ecol Date: 2018-09-07 Impact factor: 6.185

9. Exclusion and Genomic Relatedness Methods for Assignment of Parentage Using Genotyping-by-Sequencing Data.

Authors: Ken G Dodds; John C McEwan; Rudiger Brauning; Tracey C van Stijn; Suzanne J Rowe; K Mary McEwan; Shannon M Clarke
Journal: G3 (Bethesda) Date: 2019-10-07 Impact factor: 3.154

10. Low temperature isothermal amplification of microsatellites drastically reduces stutter artifact formation and improves microsatellite instability detection in cancer.

Authors: Antoine Daunay; Alex Duval; Laura G Baudrin; Olivier Buhard; Victor Renault; Jean-François Deleuze; Alexandre How-Kit
Journal: Nucleic Acids Res Date: 2019-12-02 Impact factor: 16.971