Michael J Sheehan1, Michael W Nachman1. 1. Museum of Comparative Zoology and Integrative Biology, University of California, 3101 Valley Life Science Building, Berkeley, California 94720, USA.
Abstract
Facial recognition plays a key role in human interactions, and there has been great interest in understanding the evolution of human abilities for individual recognition and tracking social relationships. Individual recognition requires sufficient cognitive abilities and phenotypic diversity within a population for discrimination to be possible. Despite the importance of facial recognition in humans, the evolution of facial identity has received little attention. Here we demonstrate that faces evolved to signal individual identity under negative frequency-dependent selection. Faces show elevated phenotypic variation and lower between-trait correlations compared with other traits. Regions surrounding face-associated single nucleotide polymorphisms show elevated diversity consistent with frequency-dependent selection. Genetic variation maintained by identity signalling tends to be shared across populations and, for some loci, predates the origin of Homo sapiens. Studies of human social evolution tend to emphasize cognitive adaptations, but we show that social evolution has shaped patterns of human phenotypic and genetic diversity as well.
Facial recognition plays a key role in human interactions, and there has been great interest in understanding the evolution of human abilities for individual recognition and tracking social relationships. Individual recognition requires sufficient cognitive abilities and phenotypic diversity within a population for discrimination to be possible. Despite the importance of facial recognition in humans, the evolution of facial identity has received little attention. Here we demonstrate that faces evolved to signal individual identity under negative frequency-dependent selection. Faces show elevated phenotypic variation and lower between-trait correlations compared with other traits. Regions surrounding face-associated single nucleotide polymorphisms show elevated diversity consistent with frequency-dependent selection. Genetic variation maintained by identity signalling tends to be shared across populations and, for some loci, predates the origin of Homo sapiens. Studies of human social evolution tend to emphasize cognitive adaptations, but we show that social evolution has shaped patterns of human phenotypic and genetic diversity as well.
Human societies are predicated on our abilities to individually recognize and track scores of people in our social networks[1,2]. The complexity of human societies is widely recognized as a major selective force that has shaped our cognitive abilities and social intelligence[3-5]. Indeed, humans have highly developed individual recognition abilities, and there is a rich literature examining social cognition and individual recognition in humans[6-9]. In particular, facial recognition plays a critical role in human social interactions, and the cognitive mechanisms underlying facial recognition have been well studied[9-11]. When it comes to recognition, however, cognition is only one half of the equation. Recognition also depends on phenotypic variation in a population[12], without which discrimination is impossible. Compared to other animals or other parts of our bodies, we perceive human faces as being unusually variable and easy to recognize (Fig. 1). While this phenomenon can be at least partly explained by our specialization for learning human faces[10], the fact that facial recognition is so important for social interactions among humans suggests that selection may have lead to increased facial distinctiveness. Despite the striking differences among human faces and the importance of facial identity for human society, the evolution of individuality in human faces has yet to be explored.
Figure 1
Humans have much more individually distinctive faces than many animals
(A) Human populations show extensive variability in facial morphology that is used for individual recognition. Patterns of elevated variability are even maintained in more genetically homogeneous populations such as the Finnish, as demonstrated by the portraits of six male soldiers (B) In contrast to the variability present in human faces, many animals such as king penguins have much more uniform appearances. While king penguins are not known to visually recognize individuals, they do have highly distinctive vocalizations that are used for individual recognition. (Photo credits: SA-kuva, Finnish Armed Forces photograph; Wikimedia commons)
Theoretical and empirical studies have proposed multiple non-mutually exclusive negative frequency-dependent selection (NFDS) pressures that may maintain elevated phenotypic diversity in natural populations. These include apostatic selection, in which predation is lower on rare prey[13,14], mating preferences for novel phenotypes[15] and selection to be recognizable[16]. For example, both apostatic selection and frequency-dependent mate preferences have been shown to contribute to the maintenance of the highly variable and heritable male coloration patterns in the guppy (Poecilla reticulata)[14,17]. Frequency-dependent attractiveness has been proposed as a mechanism to explain patterns of hair and eye color diversity in Europeans[18] and recent tests have found empirical support for frequency-dependent attractiveness of beards in human males[19]. Whereas elevated variation in guppy coloration is limited to adult males[17], human facial individuality is not limited to a particular age or sex-class, suggesting that frequency-dependent mate preferences are unlikely to be the sole or major driver of elevated diversity in human facial patterns. Indeed, individual recognition is important in humans from cradle to grave across multiple contexts. Selection to be individually recognizable in a variety of scenarios is therefore a prime hypothesis to explain the high diversity in human facial appearance[12].Individual recognition will only evolve when it is beneficial to identify other individuals[12]. Whether or not individuals benefit be being identifiable and easily recognized raises a different question. Traits used for individual recognition are expected to evolve as either identity cues or as identity signals depending on the benefits of being recognized[12,20,21]. Identity cues are traits that allow discrimination but have not evolved for the purpose of recognition[22] and are not expected to show signatures of adaptive evolution. Cues are essentially inadvertent phenotypic variation that other individuals can use for discrimination[23,24]. For example, today human fingerprints are used for forensic identification though they have not evolved to facilitate recognition. As is the case for fingerprints, identity cues do not necessarily benefit individuals that are being recognized and may in fact harm them. In contrast, identity signals are traits that have been selected to facilitate individual recognition and as a result show elevated variation within populations[16,25]. Individual recognition can rely on cues alone, but if individuals benefit from being recognized then selection is expected to favor individuals to advertise their identity with distinctive phenotypes[12,20,21]. Identity signals evolve when being confused with others is costly due to misdirected behaviors including aggression[26], mating opportunities[27], parental care[28], etc. Comparative and experimental evidence for identity signaling leading to increased phenotypic diversity has been documented in multiple taxa[25,26,28,29] but has not been investigated in humans.Individual recognition is facilitated when individuals display divergent trait values and novel combinations of traits[11,21] leading to disruptive selection on multiple traits and the evolution of independent developmental pathways[21,26]. Three key predictions of the identity signaling hypothesis are (i) facial characteristics should be more variable than other visible traits not used for recognition, (ii) face traits are expected to show lower inter-trait correlations compared to other morphological traits, and (iii) loci underlying normal facial variation are expected to show elevated genetic diversity consistent with NFDS favoring rare phenotypes. Loci contributing to identity-signaling traits are expected to show evidence of NFDS, such as an excess of intermediate-frequency alleles and elevated diversity when controlling for divergence[30,31]. Selection for identity signaling on any one facial trait is likely to be relatively weak as there are numerous traits that contribute to individual identity. Due to the complex genetic architecture of facial variation[32-34] we expect a modest signature of elevated diversity in genomic regions underlying identity signals as a whole[35,36], though there may be stronger evidence for NFDS at a subset of loci. While it is plausible that identity cues could also show elevated phenotypic variation and reduced phenotypic correlations as a result of relaxed selection or stochastic developmental processes, the loci underlying identity cues are expected to evolve neutrally. Thus even a weak signature of NFDS, as may be expected for a complex quantitative trait such as facial identity, would reject the cue hypothesis and provide support for identity signaling.Consistent with the predictions of the identity-signaling hypothesis, we find elevated phenotypic variation and reduced levels of inter-trait correlations in human faces compared to non-facial morphology. Furthermore, we find population genomic support for the identity-signaling hypothesis. Loci associated with variation in normal facial morphology show elevated nucleotide diversity compared to loci associated with variation in height or presumably neutral, intergenic variation. The loci with the strongest evidence of selection tend to be shared across continents, suggesting that selection on at least some loci associated with identity signaling is likely to be old. Indeed, by comparing sequences of modern humans to those of Neanderthals and Denisovans, we demonstrate that variation at some loci associated with facial morphology predates that origin of the human species. While studies of human social evolution have tended to emphasize it effect on cognition, our results suggest that social evolution has also played an important role in shaping human morphology.
Results
Morphological evidence
Morphological comparisons between faces and other traits are consistent with the predictions of identity signaling. We tested these predictions using data for 18 facial and 46 non-facial linear distance measures from the ANSUR anthropometric study of US army personnel[37] for females and males of African and European American ancestry respectively (Supplementary Tables 1-2). Linear distances between facial landmarks have higher coefficients of variation than linear measurements of body traits in every group (Fig. 2a, Mann-Whitney U (MWU) test, n = 18 facial and 46 non-facial measures, P < 0.03 for all comparisons). Without selection for uncoupled development, traits within individuals are generally correlated, as larger individuals tend to have larger traits[38,39]. However, facial traits show lower inter-trait correlation coefficients than body traits in all four groups as predicted by the identity-signaling hypothesis (Fig. 2b, MWU test, n = 153 facial correlations and 1035 non-facial correlations, P < 0.001 for all comparisons). Indeed the vast majority of the body measures are correlated (Percentage of significant Pearson's correlations between traits, AAF = 95.17%, AAM = 99.14%, EAF = 96.62%, EAM = 99.81%, n = 1035 pairwise comparisons), though many fewer facial traits are correlated with each other (AAF = 63.4%, AAM = 73.9%, EAF = 47.1%, EAM = 84.2%, n = 153 pairwise comparisons, Z-ratio < -11 and P < 0.002 for all comparisons). Uncorrelated values for face traits increase the diversity of facial phenotypes, facilitating recognition. For example, the breadth and length of hands are correlated (Fig. 2c, r = 0.30, P<0.0001), though the breadth and length of noses are not (Fig. 2d, r = 0.002, P = 0.06). These results add to previous findings that humans have among the lowest craniofacial morphological integration among primates and mammals more broadly[40].
Figure 2
Morphological evidence that human faces have evolved to signal individual identity
Morphological comparisons of facial features to other aspects of body morphology are consistent with selection for identity signals. (A) In all four groups examined facial traits have higher coefficients of variation than other body traits (P < 0.03 for all comparison). (B) Facial traits as a group show lower inter-trait correlations than non-facial traits in all four populations examined (P < 0.001 for all comparisons). (C) For most traits, such as hands, larger individuals have larger traits such that the width and length of an individual's hand are correlated. (D) In contrast to hands, the width and length of the nose are not correlated. Box-plots show median and 25th and 75th percentiles (N = 181 African American females; 457 African American males; 204 European American females; 1168 European American males).The P-values shown The scatterplots show the trait values for European American male service members measured in the ANSUR II dataset. Best-fit lines are shown for significant regressions.
Population genomic evidence
Using data from the 1000 Genomes Project[41], we tested for a signature of NFDS in genomic regions surrounding SNPs previously associated with differences in normal facial morphology in Europeans[32,33]. We compared the distribution of population genetic summary statistics calculated around face SNPs to the distribution of summary statistics for 5000 putatively neutral intergenic regions as identified by the Neutral Region Explorer[42]. Here we present an analysis based on 2kb windows, though the patterns of diversity reported here are robust to a range of window sizes (Supplementary Fig. 1). Elevated diversity in face regions relative to the intergenic regions would be consistent with the predictions of NFDS. However, it is possible that morphological traits in general could show elevated diversity in comparison to neutral regions, so we also compared face regions to regions surrounding SNPs associated with height[43], another complex morphological trait, as an additional control.Patterns of diversity surrounding face-associated SNPs are consistent with NFDS on complex quantitative traits as predicted under the identity-signaling hypothesis. Here we present the values for the Finnish population (Fig. 3) though broadly similar overall patterns are found for other 1000 Genomes population samples from Europe and Africa and to a weaker extent Asia (Supplementary Fig. 2-9). The folded site-frequency spectrum shows that regions surrounding face-associated SNPs have an excess of intermediate frequency variants compared to the two sets of control loci (Fig. 3a, MWU, P < 0.0001 for both comparisons). Additionally, the distribution of summary statistics for faces differs from distributions found for height or intergenic regions consistent with NFDS on facial traits (Fig. 3b-e, P < 0.05 for all comparisons). One possible confounding factor is that intermediate-frequency SNPs are overrepresented in genotyping panels[44] and thus more often associated with traits in genome-wide association studies, so elevated diversity could conceivably be confounded by ascertainment biases. Two lines of evidence argue against this. First, the association studies and population genomic analyses were conducted in different samples of Europeans, and the patterns of elevated diversity around face SNPs are also found in African and Asian populations. Second, in the Finnish examined here the minor allele frequency of the focal SNPs is actually lower for faces than for height (MWU, P = 0.028, Supplementary Fig. 10), suggesting that potential biases in association studies cannot explain the elevated patterns of diversity surrounding face-associated SNPs. The combination of elevated morphological and genetic diversity associated with human faces rejects a neutral explanation for human facial individuality and instead supports the hypothesis that human facial diversity is the product of selection for identity signaling in humans.
Figure 3
Population genomic evidence that human faces have evolved to signal individual identity
Genomic regions associated with facial morphology show evidence of selection for identity signaling in the Finnish. (A) Face regions (N=59) have elevated levels of intermediate-frequency alleles compared to neutral regions (N=5000) or genomic regions associated with variation in height (N=365). The bar graph shows the proportion of SNPs within each allele-frequency bin. (B) Additionally, face regions have elevated levels of π, (C) even after controlling for differing rates of divergence among loci. (D) Similarly, face regions show an elevated number of segregating sites, measured as Waterson's θ. (E) Tajima's D is higher in facial regions than neutral regions while (F) Fu and Li's D* is higher in facial regions than height regions. Box-plots show medians and 25th and 75th percentiles. Whiskers shows the 5th and 95th percentiles. Outliers are not shown so that the main distributions can be viewed at larger size. The P-values shown are from one-tailed Mann-Whitney U tests. Note that sample sizes are reduced for tests corrected for divergence, as alignments were not available for all regions considered (N= 58 face loci, 356 height loci, 4873 neutral loci).
Evolutionary dynamics of identity signaling loci
Due to the lack of data on SNPs associated with signaling traits in animals, population genomic methods have not previously been used to empirically explore the evolutionary dynamics of signaling traits. The present dataset on identity signals in humans, however, provides an unprecedented opportunity to examine the history of selection on signaling traits used in social communication. Selection for identity signaling is expected to act on faces in all populations though it need not occur at the same loci. Conceivably, selection may act on the same loci across populations; different populations could maintain diversity at distinct loci underlying the same trait; or selection may act on loci underlying different traits in each population depending on the dynamics of selection as human populations expanded across the globe. We explored this question by assessing whether loci showing elevated diversity, where both π corrected for divergence with macaques and Tajima's D fall in the 95th percentile, were shared across populations. Indeed a disproportionate number of loci show evidence of elevated diversity in at least one population for faces (9/58) compared to height (6/356; χ2 = 23.5, P < 0.0001) and intergenic regions (57/4873; χ2 = 78.8, P < 0.0001, Fig. 4). Furthermore, the regions that show elevated diversity for faces are more consistent across continents than expected; 5 of 9 regions show elevated diversity on at least two continents compared to 1 of 57 intergenic regions (χ2 = 27.2, P <0.0002, Fig. 4). All 9 regions identified as having elevated diversity in at least one population have high levels of π/divergence (>90th percentile) in both African populations examined here (Table s3). Additionally, analyses of the haplotype networks for the 9 regions show greater allelic diversity in African populations, with European and Asian populations carrying a subset of the African haplotypes (Supplementary Fig. 11-19). These patterns of diversity and haplotype sharing across populations are consistent with an African origin of allelic variation at identity signaling loci predating human migration out of Africa. Population differentiation in facial morphology appears in part to be the result of differential loss of diversity in non-African populations, consistent with reduced morphological variation in populations with increased distance to Africa[45].
Figure 4
Patterns of elevated diversity in face-associated loci across populations
The face-associated loci with elevated diversity consistent with selection for identity signaling tend to be shared across populations both within and between continents. The heatmap highlights loci on the extreme ends of the distributions for π (controlling for divergence with macaque) and Tajima's D. Columns correspond to populations and rows correspond to individual loci. Squares that are fully filled in with dark blue designate loci with evidence of elevated diversity (>95th percentile for both summary statistics). A greater number of loci show evidence of elevated diversity in at least one population for faces (9/58) compared to height (6/356; χ2 = 23.5, P < 0.0001) and intergenic regions (57/4873; χ2 = 78.8, P < 0.0001). Additionally, patterns of elevated diversity are more consistently shared across populations for face-associated regions compared to the neutral regions (5/9 face regions versus 1/57 neutral regions, χ2 = 27.2, P <0.0002). To facilitate visual comparison representative subsamples of height and intergenic regions are shown here. Subsamples were generated by randomly selecting loci from the height and neutral lists, which we confirmed did not deviate from the distribution of the total sample. All analyses reported were conducted on the full datasets.
Here we present two examples highlighting the complex evolutionary trajectories of loci involved in identity signaling in humans. The examples illustrate (i) that genetic variation underlying identity signals tends to be old and of African origin and (ii) that phenotypic divergence between non-African populations is partly related to the differential loss of ancestral variation (Fig. 5). Variants associated with the distance between the chin and bridge of the nose[33] are found within an intron of TMCT2. A sliding window analysis of the region demonstrates that there is elevated diversity and reduced Fst consistent with sustained selection for identity signaling that is common to the three continental groups or occurred in their ancestral population (Fig. 5a). In contrast to the shared diversity at TMCT2, intronic variants of SDK1 associated with nasal morphology[32] show a clear reduction of nucleotide diversity in Asian populations compared to the elevated diversity found in African populations. The reduction in diversity in Asian populations and increased Fst between Asian and African population could either be the result of loss of diversity during population bottlenecks or directional selection on nasal morphology in Asian populations (Fig. 5b). For both loci we constructed gene trees for the 5kb window with the highest level of nucleotide diversity for 30 modern human sequences as well as the Neanderthal, Denisovan and chimpanzee sequences (Fig. 5c-d). Both trees provide further evidence for the ancient origins of loci under selection for identity signaling as archaic Hominin species are nested within modern human diversity. This result suggests that selection on some loci associated with identity signaling predates the origin of Homo sapiens and the emergence of modern facial morphology.
Figure 5
Evolutionary history of example face-associated loci
Patterns of genetic diversity associated with facial morphology at TMCT2 and SDK1. (A) At TMCT2 variation is largely shared across continents, while (B) at SDK1 variation has been lost mainly in the CHB population. The sliding window analyses (A – B) show nucleotide diversity for three 1000 Genomes populations representing Europe (FIN), Asia (CHB) and Africa (YRI) respectively for 5kb windows at 1kb sliding intervals. Nucleotide diversity is shown with solid lines while Fst is represented by dotted lines. Color of the lines represents the population examined for π (FIN = blue, CHB = black, YRI = red) or the two population Fst comparisons (FIN - YRI= red, CHB - YRI = black, FIN – CHB = blue) The locations of SNPs associated with facial morphology are shown as blue circles except for the focal SNP included in other window-based analyses that is denoted with a red circle. The UCSC Genome Browser tracks showing the locations of exons and three ENCODE regulatory regions, which show regions likely associated with genomic features involved in gene regulation, are shown below the sliding window. (C - D) Maximum-likelihood trees show the relationships among 10 modern humans sampled from each of three populations (FIN, CHB and YRI) as well as sequences from Denisovan, Neanderthal and Chimpanzee. The modern human sequences are colored according to their population of origin (FIN = blue, CHB = black, YRI = red). The region analyzed was the 5kb window with the highest nucleotide diversity as determined by the sliding window analysis. Note that in both cases, the sequences for archaic Hominins are nested within modern human diversity, indicating the origin of the major haplogroups predates the evolution of Homo sapiens.
Discussion
Here we have presented both morphological and population genomic evidence consistent with the hypothesis that selection for individual identity signals has shaped patterns of human facial diversity. Though the evidence for selection at individual loci is modest, as expected for molecular evolution of polygenic traits[35], the combination of morphological and genomic data from multiple populations clearly rejects the identity cue hypothesis and provides compelling evidence consistent with the idea that selection for individual identity signaling has shaped patterns of facial morphology in humans. Provided that the variation used in identity signals is not developmentally costly to produce or maintain, even a small selective advantage of individuality is expected to give rise to elevated phenotypic diversity when confusion is costly[21]. Previous studies have shown that being confused with others may be costly in a range of circumstances including within social hierarchies in Polistes wasps[26], sexual selection in house mice[27], and parent-offspring interactions in cliff swallows[28]. It is unknown at present, which aspects of human sociality have been the most important sources of selection for identity signaling though it is likely that multiple facets of social interactions contribute to selection for identity signals. Individual recognition and discriminating among individuals plays a role in shaping important human behaviors including kin recognition[46], investment in offspring[47], and cooperation[48]. It is likely that many social contexts favor identity signaling in humans, so it will be important for future research to explore the relative benefits of individuality across many social contexts and developmental stages in humans.In addition to selection for identity signaling it is possible that other frequency-dependent process such as preferences for mates with rare or novel features could have played a role in shaping human diversity. For example, a recent study showed frequency-dependent effects on the attractiveness of male facial hair-styles[19]. Preferences for individuals with rare phenotypes have also been shown in other animals, such as guppies where rare phenotypes confer a survival advantage due to reduced predation[14,17]. In humans, females tend to advertise physical attractiveness to mate more prominently than do males, who tend to advertise resources or performance ability[49,50]. Thus if frequency-dependent mate preferences were the major driver in determining facial identity, then females might be expected to show elevated levels of individuality compared to males just as female preferences for novel individuals contributes to the elevated color pattern variation seen in male guppies[17]. However, in humans both males and females show elevated individuality in faces compared to other external morphology (Fig. 2), suggesting that mate preferences alone cannot explain the patterns observed here. Similarly, mate preferences might be expected to drive variation only in adults as is observed in guppies[17], yet distinctive facial morphology is seen at all life stages in humans. To the extent that frequency-dependent mate preferences play a role in shaping patterns of facial individuality it is likely that mating preferences and identity signaling would have a positive feedback. If individual distinctiveness is beneficial in non-sexual contexts, preferences for mates with rare phenotypes may then also provide an additional benefit to distinctiveness[15]. Finally, our data do not preclude potential directional or stabilizing selection pressures that may arise from other potential mating preferences[51] or climate[52] on particular features of human facial morphology, though directional and stabilizing selection do not predict elevated genetic diversity within populations at the associated loci and so cannot explain the patterns of elevated genetic variation we have documented here.It is important to note that facial recognition is widespread in primates[53] and identity signaling is unlikely to be limited to human facial morphology, though the loci under selection may vary considerably across species. This may be especially true for humans, which have undergone considerable directional evolution of facial form during the course of hominin evolution[54]. While faces are a key feature used in human social recognition other traits such as our voices also contribute to recognition and may have also experienced selection for identity signaling. Additionally, the strength of selection for particular identity signaling traits may have changed over time in modern humans as cultural practices gave rise to individually distinctive clothing and hairstyles, which provide additional cues to identity. Traditional treatments of social selection in human evolution have emphasized the potential role for social interactions in shaping our cognitive abilities [3], though our work demonstrates that social selection has shaped our morphology as well to facilitate social recognition. Importantly, our work draws a link between social interactions and the maintenance of genetic variation underlying traits used in social recognition. Social recognition is found across disparate animal taxa suggesting that selection for identity signaling is likely to be a common mechanisms generating phenotypic variation and maintaining genetic variation.
Methods
Morphological Analyses
We examined morphological relationships among body parts and facial features using published anthropometric datasets. We focused our analyses on the ANSUR II dataset because it provides a large, consistent database of individual-level facial and body measurements. We analyzed the linear anthropometric measurements (Supplementary Tables 1-2). In our analysis we considered four groups of service members based on their sex and racial identity: African American females (n = 181, mean height = 64.29 + 0.17 inches), African American males (n = 457, mean height = 69.12 + 0.13 inches), European American females (n = 204, mean height = 64.27 + 0.15 inches), and European American males (n = 1168, mean height = 69.32 + 0.08 inches). Compared to the general civilian population the individuals measured in the ANSUR II dataset tend to be taller and have lower levels of body fat[55]. Neither of these factors should influence our results or conclusions because our comparisons use facial and body measurements from the same individuals. Identity signaling predicts that traits used for recognition will have greater variance and be less correlated with each other compared to non-recognition traits in the same group of individuals.Using the ANSUR II dataset we tested two predictions of the identity-signaling hypothesis. First, we considered the levels of variation in each trait by calculating the coefficient of variation – by the dividing the standard deviation of each trait by the mean. Coefficients of variation provide a scale-free method for comparing variation across samples that differ in average size as is the case for human morphological data. Second, we considered the correlations among traits by calculating the inter-trait Pearson's correlations for all pairwise combination of traits within each class of traits. To compare the distribution of correlation coefficients between bodies and faces we recorded the correlation coefficients significant at the P < 0.05 level. For any pair of traits which did not show a significant correlation at P < 0.05, we recorded the correlation as 0. Pearon's correlation test is sensitive to the sample size such that correlations are more likely to be significant when larger samples are used. Therefore, comparisons between the different groups considered should be made with caution because of differences in sample size. For example, differences in the percentage of significant pairwise comparisons between males and females likely reflect differences in samples sizes. Within a group, however, the same individuals were measured for both facial and body traits, providing a direct comparison of the relative degree of correlation among traits.
Selection of Genomic Regions for Analysis
Face-associated SNPs were taken from two recent genome wide association studies of normal facial morphology. Paternoster et al[33] conducted a discovery phase association study where they examined the relationship between facial characteristics and more than 2.5 million imputed SNPs in a sample of 2,185 15 year olds from the Avon Longitudinal Study of Parents and their Children (ALSPC)[56]. Only subjects who genetically clustered with the CEU HapMap population were included in their analysis. The study identified 30 loci associated with facial morphology at P< 5×10-7, which we examined in our study. Liu et al[32] examined the relationship between facial morphology and more than 2.5 million SNPs in a discovery phase association study of 5,388 adults. The samples in the Liu et al[32] study came individuals of European ancestry living in the Netherlands, Australia, Canada, Germany and the United Kingdom. They identified 29 loci associated with facial morphology at P< 5×10-7, which we examined in this study. None of the SNPs identified by the two studies overlapped, providing a total of 59 loci for investigation. In both studies, multiple linked SNPs were often identified in association with a particular phenotype. When more than one SNP was associated with a trait we chose the SNP with the smallest P value within a 1MB region of a chromosome from the association study. The SNPs identified for further examination from the two studies include one from each of 59 loci distributed throughout the autosomes. The SNPs are largely intergenic (95%) though a few occur within introns (5%). None were located in coding regions.We compared face-associated genomic regions to two sets of control regions. First, we examined SNPs associated with height taken from the GWAS Catalog of the National Human Genome Research Institute (www.genome.gov) on April 25, 2013. In order to prevent multiple sampling of any regions we only considered SNPs that were separated by more than 4kb. In the instances where multiple nearby SNPs had been associated with height, we chose the SNP that had been associated with the smallest P-value as reported in the GWAS Catalog. We excluded six SNPs associated with height that fall within the HLA region though including the SNPs in our analyses does not alter our pattern of results. This produced a total of 365 loci associated with variation in height. Like faces, height is a composite character that depends on the morphology of numerous different bones. Additionally, both height and facial morphology have complex genetic bases with many loci of small effect contributing overall phenotypic variation [43]. SNPs associated with height are predominantly located in intronic regions (54%) and intergenic regions (36%) with a smaller percent found near the 3′ and 5′ end of genes (6%) or exons (3%). Second, we considered the genome-wide patterns of diversity by examining 5000 2kb intergenic regions. We identified putatively neutral intergenic regions in Europeans using the Neutral Region Explorer webserver [42]. The same set of intergenic regions was used for all populations.We analyzed regions surrounding the SNPs identified by the association studies at a set window size. The causative mutations underlying the traits are not known, though are likely to be located near the SNPs identified through genome-wide association studies[57]. The a priori best choice for a window size is not clear, though the patterns of elevated nucleotide diversity we observer are seen over a range of window sizes (Supplementary Fig. 1). We chose to analyze 1kb both up and downstream of the SNPs, providing windows of 2kb. Smaller window sizes show marked increased variance in the summary statistics across loci (Supplementary Fig. 1), though this variance levels off at window sizes of 2kb or greater.
Summary Statistics
We calculated summary statistics for each population using binary SNP and indel data from 1000 Genome Project Phase 1 variants. Nine non-admixed populations originating from Europe (CEU: Utah residents with Northern and Western European Ancestry; GBR: British from England and Scotland; FIN: Finnish from Finland; TSI: Toscani from Italia), Asia (CHB: Han Chinese in Bejing, China, CHS: Southern Han Chinese, JPT: Japanese from Tokyo, Japan) and Africa (LWK: Luhya from Webuye, Kenya; YRI: Yoruba from Ibadan, Nigeria) were considered in our study. We downloaded the population data to Galaxy [58] using the Table Browser function of the UCSC genome browser. We filtered the data based on the sets of 2kb windows for face, height and intergenic regions to produce three files for each population, which we subsequently examined using custom macros in Excel.We used folded site frequency spectra to examine the distribution of minor allele frequencies among SNPs found within each of the demarcated regions. The expected distribution of allele frequencies at loci underlying a polygenic trait under negative frequency dependent selection is unclear and will depend on the exact form of selection and the genetic architecture of the trait [59]. Nonetheless, frequency-dependent selection is expected to maintain alleles in a population, on average, longer than expected for neutral alleles[60]. Thus, the distribution of allele frequencies should differ from that expected in a stationary population at mutation drift equilibrium. In particular, we expect fewer rare alleles under a scenario of frequency-dependent selection. Spectra were compared using the raw counts of SNPs with each minor allele frequency using a Mann-Whitney U test. To graph the folded site frequency spectra we binned data into ranges of minor allele frequencies.In addition to the aggregated site frequency spectrum analysis, we also considered the distribution of multiple summary statistics of genetic diversity across the loci considered within our study. We calculated the following summary statistics for each 2kb window: π, π corrected for human-macaque divergence, Waterson's θ Tajima's D and Fu and Li's D*. Both π and θ, are estimators of the neutral mutation parameter, 4Neμ. π is based on the number of pairwise differences among sequences within a sample and θ is based on the proportion of segregating sites. Loci under frequency-dependent selection are expected to show elevated values for π because frequency-dependent selection maintains alleles over longer periods of time. Older alleles accumulate mutations and therefore show higher levels of pairwise sequence divergence. We also examined the distribution of π corrected for the rate of divergence between humans and macaques. Different regions of the genome are known to experience differences in rates of mutation [61]. Loci with higher mutation rates will show elevated levels of π. The rate of divergence between humans and macaques provides a means of estimating the relative differences in mutation rates among loci[62]. The maintenance of multiple alleles in a population under frequency-dependent selection is also expected to lead to higher estimates of θ. Tajima's D is the normalized difference between π and θ. Tajima's D takes on positive values when there is an excess of intermediate frequency variants and negative values when there is an excess of rare variants. Fu and Li's D* is based on the number of nucleotide variants observed only once in a sample[63]. Negative measures of Fu and Li's D* indicate an excess of singletons. Loci under frequency-dependent selection are expected to have a relatively smaller number of singletons and therefore more positive values of Fu and Li's D*.We calculated the summary statistics using the allele frequencies given in the Phase 1 Variant files from the 1000 Genomes project. The short indels recorded in the dataset were considered in the same manner as SNPs. Human-macaque divergence data were estimated using the LastZ alignment of the two reference genomes. Only regions with alignments between the two species' genomes were considered in the analysis of π corrected for divergence with macaques (Faces = 58 regions, Height = 356 regions, Neutral = 4873 regions) for subsequent analyses using this statistic. For the aligned regions, the average alignment lengths were 1832.21 ± 4.49 sites out of 2000. We compared the distribution of summary statistics for face-associated loci to the distributions for the two control datasets using one-tailed Mann-Whitney U tests.
Patterns of diversity across populations
We asked whether the same loci showed elevated diversity in different populations. To do this we identified loci in each population for which π/divergence and Tajima's D were above the 95th percentile as determined from the empirical distribution of intergenic regions examined. We then asked whether or not a disproportionate number of loci with elevated diversity were shared between continents for face regions compared to the intergenic regions examined. For the nine loci showing elevated diversity in at least one population, we investigated the patterns of haplotype sharing across populations. We examined the sequences in the 2kb window used for previous analyses. For those coordinates we downloaded a combined PED file including CHB, FIN and YRI from the 1000 Genomes project site (browser.1000genomes.org). We converted the PED files to fasta format using PGD Spider [64]. This procedure produced a fasta file containing the polymorphic sites found within the examined loci. Using the ‘pegas’ package in R [65] we created haplotype networks for each of the loci.
Sliding window analyses
To examine the extent to which selection for identity signaling has been shared or divergent across continents we conducted a sliding-window analysis of the regions identified as having elevated diversity in at least one population. We calculated π and Fst for 5kb windows every 1kb for a total of 200 kb. π was calculated for one representative population for each continent (FIN, CHB and YRI). We estimated levels of differentiation between populations using Hudson's Fst following[66] as it produces unbiased estimates of Fst and is less sensitive to sample size and rare variants than other estimates of Fst such as Weir-Cockeram and Nei's[67]. We estimated Fst for each set of SNPs considered by calculating a ratio of averages rather than an average of ratios, as the former is less sensitive to the presence of rare variants in a sample [66].The SNPs identified in association with facial morphology are not found in coding sequences, so they are likely to influence gene regulation or splicing in some manner. For the two loci examined in greater detail, we used the UCSC genome browser to identify polymorphic sites in ENCODE regulatory regions. We focused on three ENCODE tracks in the UCSC browser[68]: H3K27Ac marks, DNase sensitivity clusters, and transcription factor binding sites. The H3K27Ac marks show regions for which there is CHIP-seq based evidence of enrichment for the H3K27Ac histone mark. H3K27 acetylation is associated with enhanced transcription. DNase sensitivity clusters show regions sensitive to DNase as assessed across 125 cell types. Promoters and other regulatory regions tend to be DNase sensitive. The transcription factor track shows regions with evidence of transcription factor binding sites.
Gene Trees
For the two 5kb loci examined we constructed maximum likelihood gene trees with 10 sequences each from the FIN, CHB and YRI 1000 Genomes populations for a total of thirty sequences. Additionally, we included the human and chimpanzee reference sequences as well as sequences for Denisovans[69] and Neanderthals (http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/bam/). We downloaded the alignment of the human and chimpanzee reference sequences from Ensembl. Denisovan sequences were downloaded using the Table Browser function of the UCSC Genome Browser. The draft Altai Neaderthal sequences were downloaded for the relevant chromosomes from the Department of Evolutionary Genetics at the Max Planck Institute's website. We constructed individual sequences for the 1000 Genomes, Denisovan and Neanderthal by manually altering the human reference sequence in accordance with the data found in the respective VCF files using Mega 5.2.1[70]. For the phased 1000 Genomes data we selected one chromosome per individual sample. For the Neanderthal and Denisovan sequences, we included all of the sites that differed from the human reference to make a single sequence. After removing sites with gaps in the alignment, we constructed a maximum likelihood tree using a general time reversible model with a gamma distribution of invariant sites.
Authors: Matthias Meyer; Martin Kircher; Marie-Theres Gansauge; Heng Li; Fernando Racimo; Swapan Mallick; Joshua G Schraiber; Flora Jay; Kay Prüfer; Cesare de Filippo; Peter H Sudmant; Can Alkan; Qiaomei Fu; Ron Do; Nadin Rohland; Arti Tandon; Michael Siebauer; Richard E Green; Katarzyna Bryc; Adrian W Briggs; Udo Stenzel; Jesse Dabney; Jay Shendure; Jacob Kitzman; Michael F Hammer; Michael V Shunkov; Anatoli P Derevianko; Nick Patterson; Aida M Andrés; Evan E Eichler; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo Journal: Science Date: 2012-08-30 Impact factor: 47.728
Authors: Lavinia Paternoster; Alexei I Zhurov; Arshed M Toma; John P Kemp; Beate St Pourcain; Nicholas J Timpson; George McMahon; Wendy McArdle; Susan M Ring; George Davey Smith; Stephen Richmond; David M Evans Journal: Am J Hum Genet Date: 2012-02-16 Impact factor: 11.025
Authors: Hana Lango Allen; Karol Estrada; Guillaume Lettre; Sonja I Berndt; Michael N Weedon; Fernando Rivadeneira; Cristen J Willer; Anne U Jackson; Sailaja Vedantam; Soumya Raychaudhuri; Teresa Ferreira; Andrew R Wood; Robert J Weyant; Ayellet V Segrè; Elizabeth K Speliotes; Eleanor Wheeler; Nicole Soranzo; Ju-Hyun Park; Jian Yang; Daniel Gudbjartsson; Nancy L Heard-Costa; Joshua C Randall; Lu Qi; Albert Vernon Smith; Reedik Mägi; Tomi Pastinen; Liming Liang; Iris M Heid; Jian'an Luan; Gudmar Thorleifsson; Thomas W Winkler; Michael E Goddard; Ken Sin Lo; Cameron Palmer; Tsegaselassie Workalemahu; Yurii S Aulchenko; Asa Johansson; M Carola Zillikens; Mary F Feitosa; Tõnu Esko; Toby Johnson; Shamika Ketkar; Peter Kraft; Massimo Mangino; Inga Prokopenko; Devin Absher; Eva Albrecht; Florian Ernst; Nicole L Glazer; Caroline Hayward; Jouke-Jan Hottenga; Kevin B Jacobs; Joshua W Knowles; Zoltán Kutalik; Keri L Monda; Ozren Polasek; Michael Preuss; Nigel W Rayner; Neil R Robertson; Valgerdur Steinthorsdottir; Jonathan P Tyrer; Benjamin F Voight; Fredrik Wiklund; Jianfeng Xu; Jing Hua Zhao; Dale R Nyholt; Niina Pellikka; Markus Perola; John R B Perry; Ida Surakka; Mari-Liis Tammesoo; Elizabeth L Altmaier; Najaf Amin; Thor Aspelund; Tushar Bhangale; Gabrielle Boucher; Daniel I Chasman; Constance Chen; Lachlan Coin; Matthew N Cooper; Anna L Dixon; Quince Gibson; Elin Grundberg; Ke Hao; M Juhani Junttila; Lee M Kaplan; Johannes Kettunen; Inke R König; Tony Kwan; Robert W Lawrence; Douglas F Levinson; Mattias Lorentzon; Barbara McKnight; Andrew P Morris; Martina Müller; Julius Suh Ngwa; Shaun Purcell; Suzanne Rafelt; Rany M Salem; Erika Salvi; Serena Sanna; Jianxin Shi; Ulla Sovio; John R Thompson; Michael C Turchin; Liesbeth Vandenput; Dominique J Verlaan; Veronique Vitart; Charles C White; Andreas Ziegler; Peter Almgren; Anthony J Balmforth; Harry Campbell; Lorena Citterio; Alessandro De Grandi; Anna Dominiczak; Jubao Duan; Paul Elliott; Roberto Elosua; Johan G Eriksson; Nelson B Freimer; Eco J C Geus; Nicola Glorioso; Shen Haiqing; Anna-Liisa Hartikainen; Aki S Havulinna; Andrew A Hicks; Jennie Hui; Wilmar Igl; Thomas Illig; Antti Jula; Eero Kajantie; Tuomas O Kilpeläinen; Markku Koiranen; Ivana Kolcic; Seppo Koskinen; Peter Kovacs; Jaana Laitinen; Jianjun Liu; Marja-Liisa Lokki; Ana Marusic; Andrea Maschio; Thomas Meitinger; Antonella Mulas; Guillaume Paré; Alex N Parker; John F Peden; Astrid Petersmann; Irene Pichler; Kirsi H Pietiläinen; Anneli Pouta; Martin Ridderstråle; Jerome I Rotter; Jennifer G Sambrook; Alan R Sanders; Carsten Oliver Schmidt; Juha Sinisalo; Jan H Smit; Heather M Stringham; G Bragi Walters; Elisabeth Widen; Sarah H Wild; Gonneke Willemsen; Laura Zagato; Lina Zgaga; Paavo Zitting; Helene Alavere; Martin Farrall; Wendy L McArdle; Mari Nelis; Marjolein J Peters; Samuli Ripatti; Joyce B J van Meurs; Katja K Aben; Kristin G Ardlie; Jacques S Beckmann; John P Beilby; Richard N Bergman; Sven Bergmann; Francis S Collins; Daniele Cusi; Martin den Heijer; Gudny Eiriksdottir; Pablo V Gejman; Alistair S Hall; Anders Hamsten; Heikki V Huikuri; Carlos Iribarren; Mika Kähönen; Jaakko Kaprio; Sekar Kathiresan; Lambertus Kiemeney; Thomas Kocher; Lenore J Launer; Terho Lehtimäki; Olle Melander; Tom H Mosley; Arthur W Musk; Markku S Nieminen; Christopher J O'Donnell; Claes Ohlsson; Ben Oostra; Lyle J Palmer; Olli Raitakari; Paul M Ridker; John D Rioux; Aila Rissanen; Carlo Rivolta; Heribert Schunkert; Alan R Shuldiner; David S Siscovick; Michael Stumvoll; Anke Tönjes; Jaakko Tuomilehto; Gert-Jan van Ommen; Jorma Viikari; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael A Province; Manfred Kayser; Alice M Arnold; Larry D Atwood; Eric Boerwinkle; Stephen J Chanock; Panos Deloukas; Christian Gieger; Henrik Grönberg; Per Hall; Andrew T Hattersley; Christian Hengstenberg; Wolfgang Hoffman; G Mark Lathrop; Veikko Salomaa; Stefan Schreiber; Manuela Uda; Dawn Waterworth; Alan F Wright; Themistocles L Assimes; Inês Barroso; Albert Hofman; Karen L Mohlke; Dorret I Boomsma; Mark J Caulfield; L Adrienne Cupples; Jeanette Erdmann; Caroline S Fox; Vilmundur Gudnason; Ulf Gyllensten; Tamara B Harris; Richard B Hayes; Marjo-Riitta Jarvelin; Vincent Mooser; Patricia B Munroe; Willem H Ouwehand; Brenda W Penninx; Peter P Pramstaller; Thomas Quertermous; Igor Rudan; Nilesh J Samani; Timothy D Spector; Henry Völzke; Hugh Watkins; James F Wilson; Leif C Groop; Talin Haritunians; Frank B Hu; Robert C Kaplan; Andres Metspalu; Kari E North; David Schlessinger; Nicholas J Wareham; David J Hunter; Jeffrey R O'Connell; David P Strachan; H-Erich Wichmann; Ingrid B Borecki; Cornelia M van Duijn; Eric E Schadt; Unnur Thorsteinsdottir; Leena Peltonen; André G Uitterlinden; Peter M Visscher; Nilanjan Chatterjee; Ruth J F Loos; Michael Boehnke; Mark I McCarthy; Erik Ingelsson; Cecilia M Lindgren; Gonçalo R Abecasis; Kari Stefansson; Timothy M Frayling; Joel N Hirschhorn Journal: Nature Date: 2010-09-29 Impact factor: 49.962
Authors: Seth M Weinberg; Jasmien Roosenboom; John R Shaffer; Mark D Shriver; Joanna Wysocka; Peter Claes Journal: Orthod Craniofac Res Date: 2019-05 Impact factor: 1.826
Authors: Stefania Benetti; Markus J van Ackeren; Giuseppe Rabini; Joshua Zonca; Valentina Foa; Francesca Baruffaldi; Mohamed Rezk; Francesco Pavani; Bruno Rossion; Olivier Collignon Journal: Proc Natl Acad Sci U S A Date: 2017-06-26 Impact factor: 11.205
Authors: Marketa Kaucka; Tomas Zikmund; Marketa Tesarova; Daniel Gyllborg; Andreas Hellander; Josef Jaros; Jozef Kaiser; Julian Petersen; Bara Szarowska; Phillip T Newton; Vyacheslav Dyachuk; Lei Li; Hong Qian; Anne-Sofie Johansson; Yuji Mishina; Joshua D Currie; Elly M Tanaka; Alek Erickson; Andrew Dudley; Hjalmar Brismar; Paul Southam; Enrico Coen; Min Chen; Lee S Weinstein; Ales Hampl; Ernest Arenas; Andrei S Chagin; Kaj Fried; Igor Adameyko Journal: Elife Date: 2017-04-17 Impact factor: 8.140
Authors: Christopher M Jernigan; Jay A Stafstrom; Natalie C Zaba; Caleb C Vogt; Michael J Sheehan Journal: Anim Cogn Date: 2022-10-16 Impact factor: 2.899