Literature DB >> 26912479

The evolutionary history of genes involved in spoken and written language: beyond FOXP2.

Alessandra Mozzi1, Diego Forni1, Mario Clerici2,3, Uberto Pozzoli1, Sara Mascheretti4, Franca R Guerini3, Stefania Riva1, Nereo Bresolin1,5, Rachele Cagliani1, Manuela Sironi1.   

Abstract

Humans possess a communication system based on spoken and written language. Other animals can learn vocalization by imitation, but this is not equivalent to human language. Many genes were described to be implicated in language impairment (LI) and developmental dyslexia (DD), but their evolutionary history has not been thoroughly analyzed. Herein we analyzed the evolution of ten genes involved in DD and LI. Results show that the evolutionary history of LI genes for mammals and aves was comparable in vocal-learner species and non-learners. For the human lineage, several sites showing evidence of positive selection were identified in KIAA0319 and were already present in Neanderthals and Denisovans, suggesting that any phenotypic change they entailed was shared with archaic hominins. Conversely, in FOXP2, ROBO1, ROBO2, and CNTNAP2 non-coding changes rose to high frequency after the separation from archaic hominins. These variants are promising candidates for association studies in LI and DD.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26912479      PMCID: PMC4766443          DOI: 10.1038/srep22157

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Language, intended as the capacity to generate a limitless range of expressions using the combination of a limited set of elements and rules, is a distinctive attribute of humans. Other animals, including great apes, communicate using more simple systems that lack the open-ended power of human language1. An important component for the development of spoken language is the capacity of imitation. Vocal imitation and learning are not exclusively human, as different species of songbirds, in addition to hummingbirds and parrots, are known to learn vocalization by imitation. These species have often been referred to as “vocal-learners”, although recent observations suggest that vocal-learning abilities may be distributed as a continuum rather than as a categorical trait2. Thus, among mammals, some marine (cetaceans, pinnipeds) and terrestrial (elephant and some bats) species may be described as complex-vocal learners23. For the sake of simplicity, herein we refer to complex-vocal learners (both mammalian and avian) as vocal-learners and to all other species as non-learners. To date, the evolutionary origin of complex-vocal learning (independent gains, multiple losses from a complex-vocal learner ancestor, continuum in vocal learning abilities) remain to be elucidated12. However, animal vocalization, including birdsong, lack semantics and syntax, thus differing substantially from human language4. Importantly, whereas most animals use vocal communication, humans are unique in their use of written language. This implies the development of a system of decoding among sounds, symbols, and concepts. As first suggested by Mattingly5, “reading is parasitic on speech”, as it depends on all the components of the spoken language: syntax, morphology, phonology, pragmatics, and lexicon67. The close relationship between spoken and written language skills is well accepted and particularly evident in the comorbidity between language impairment (LI) and developmental dyslexia (DD)8. LI and DD are common neurodevelopmental disorder characterized by unexpected difficulties with verbal language and reading, respectively, despite adequate educational and socioeconomic opportunity and instruction, as well as otherwise normal development9 . In recent years, molecular genetics studies in family or case-control settings have identified candidate genes for LI and DD, with several genetic risk factors contributing to both conditions1011. The first gene to be implicated in a severe speech and language disorder was FOXP2, found to be mutated in a large family affected by verbal dyspraxia12. Since its identification, the role of FOXP2 in language (dis)abilities has been independently confirmed in several studies13, and an evolutionary analysis of its coding sequence revealed two human-specific amino acid substitutions14. This led to the hypothesis that recent changes in the FOXP2 protein have contributed to the development of human verbal skills14. This possibility has been supported by studies with animal models and cell lines1516, but challenged by other observations1718. The discovery that Neanderthals already possessed the human-specific FOXP2 variants17 fueled speculation on their impact (or lack thereof) on the development of language1 and on the timing of modern language origin1920. In fact, considerable debate still exists as to whether archaic hominins possessed a communication system comparable to that of modern humans1920212223. More recently, Maricic and coworkers24 identified a regulatory substitution in FOXP2 that is almost fixed in modern human populations, but absent in Neanderthals and Denisovans. The authors suggested that a combination of coding and regulatory variants in FOXP2 contributed to the development of modern language. Whereas this possibility remains to be verified, the FOXP2 example highlights the power of evolutionary analyses to generate specific hypotheses that can be tested using molecular genetics approaches. After the identification of FOXP2, a number of genes have been described to be implicated in LI and DD, but their evolutionary history was not thoroughly examined. Herein we took advantage of genetic diversity data for human populations and great apes, as well as of genomic information for archaic hominins, mammals, and birds to provide insight into the evolution of ten genes involved in LI and DD. These genes were selected based on the evidence of association with LI and/or DD in humans (Table 1). Most of them have established functions in brain processes including neuronal migration, cell adhesion, or axon guidance (ROBO1, ROBO2, KIAA0319, DYX1C1, CNTNAP2), as well as calcium homeostasis (ATP2C2)102526.
Table 1

List of genes.

GeneProtein nameMammals, Average dN/dS (CI)Aves, Average dN/dS (CI)DisorderaCompromised abilitybKey references
ATP2C2Calcium-transporting ATPase type 2C member 2 (ATPase 2C2)0.114 (0.108, 0.120)0.102 (0.094, 0.111)LILanguage87
CMIPC-Maf-inducing protein (c-Mip)0.022 (0.018, 0.026)0.023 (0.017, 0.031)LILanguage, reading616287
CNTNAP2Contactin-associated protein-like 20.074 (0.070, 0.079)0.076 (0.069, 0.084)LILanguage, reading618889
DCDC2Doublecortin domain-containing protein 20.222 (0.207, 0.238)0.386 (0.361, 0.413)DDReading629091
DYX1C1Dyslexia susceptibility 1 candidate gene 1 protein0.228 (0.214, 0.242)0.361 (0.333, 0.389)DDReading92939293
FOXP2Forkhead box protein P20.034 (0.028, 0.042)0.100 (0.080, 0.124)LILanguage, Speech12
KIAA0319Dyslexia-associated protein KIAA03190.312 (0.301, 0.324)0.237 (0.222, 0.252)DDReading, language60616294
NFXL1NF-X1-type zinc finger protein NFXL10.149 (0.141, 0.158)0.139 (0.128, 0.152)LILanguage, Speech70
ROBO1Roundabout homolog 10.054 (0.050, 0.058)0.045 (0.040, 0.050)DDReading, Language4449
ROBO2Roundabout homolog 20.052 (0.048, 0.057)0.046 (0.040, 0.052)DDLanguage71

aDisoder initially associated to the gene

bCompromised abilities associated to variants in the gene.

Results

Adaptive evolution in Mammals and Aves

We analyzed the evolutionary history of 10 genes reliably associated with LI and DD (Table 1) by retrieving mammalian and avian coding sequences from public databases (see methods; Supplementary Table S1). Recombination can confound evolutionary analyses by introducing apparent substitution rate heterogeneity among sites27, and by causing the estimated phylogeny to have excessively long terminal branches28. We thus screened the DNA alignments for the presence of recombination with GARD (genetic algorithm recombination detection). In the mammalian phylogeny breakpoints were detected in CMIP, CNTNAP2, DCDC2, and FOXP2; in aves, one breakpoint was detected for the DCDC2 gene (Supplementary Table S2). Taking this information into account, we calculated the average non-synonymous substitution/synonymous substitution rate (dN/dS, also referred to as ω) for the ten genes using the single-likelihood ancestor counting (SLAC) method29. As observed for most mammalian and avian genes3031, the dN/dS ratio was lower than 1 in all cases (Table 1), indicating that purifying selection is the major force shaping diversity at LI and DD genes in both animal classes. Because positive selection can act on a few sites in a protein that is otherwise selectively constrained, we applied likelihood ratio tests (LRT) implemented in the codeml program32. LRTs were run over whole gene alignment or on subregions split on the basis of the recombination breakpoints. Under two different codon frequency models (F3 × 4 and F61), two neutral models (M8a and M7) were rejected in favor of the M8 positive selection model for the mammalian ATP2C2, CNTNAP2, DYX1C1, NFXL1, and ROBO2 genes (Table 2 and Supplementary Table S3). In aves, these conditions were verified for ATP2C2, DCDC2, FOXP2, and NFXL1 (Table 2 and Supplementary Table S3). Thus, these genes represented targets of positive selection in mammals, birds, or both.
Table 2

Likelihood ratio test (LRT) statistics for models of variable selective pressure among sites (codon frequency: F3 × 4).

MAMMALS
GENESModela− 2ΔlnLbpValue (Bonferroni corrected)MEME-BEB sitesc
ATP2C2M8a vs M814.8251.180 × 10−4Q412
M7 vs M811.5173.156 × 10−3
CNTNAP2 (reg2)M8a vs M823.8941.018 × 10−6 (2.036 × 10−6)
M7 vs M812.1652.282 × 10−3 (4.565 × 10−3)
DYX1C1M8a vs M816.6104.592 × 10−5K14, C182
M7 vs M829.2704.406 × 10−7
NFXL1M8a vs M812.2374.684 × 10−4T49, G687, T907
M7 vs M871.0353.757 × 10−16
ROBO2M8a vs M84.1564.148 × 10−2
M7 vs M832.0321.107 × 10−7
AVES
GENESModela− 2ΔlnLbp Value (Bonferroni corrected)MEME-BEB sitesd
ATP2C2M8a vs M87.8785.004 × 10−3A7, F79, H933, C940
M7 vs M823.6187.436 × 10−6
DCDC2 (reg2)M8a vs M827.8931.282 × 10−7 (2.564 × 10−7)L422, T423
M7 vs M842.0717.319 × 10−10 (1.464 × 10−9)
FOXP2M8a vs M811.3203.483 × 10−3N228, V376, Q383
M7 vs M88.7413.112 × 10−3
NFXL1M8a vs M817.0013.735 × 10−5P302, I583, S728
M7 vs M833.2576.002 × 10−8

Notes:

aM7 is a null model that assumes that 0 < ω < 1 is beta distributed among sites; M8 (positive selection model) is the same as M7 but also includes an extra category of sites with ω > 1. M8a is the same as M8, except that the 11th category cannot allow positive selection, but only neutral evolution.

b2ΔlnL: twice the difference of the natural logs of the maximum likelihood of the models being compared.

cPositions refer to the human sequence (see also Supplementary table 2)

dPositions refer to the chicken sequence (see also Supplementary table 2).

The Bayes Empirical Bayes (BEB) analysis3334 and the Mixed Effects Model of Evolution (MEME)35 were next applied to the selected genes in order to identify specific sites targeted by positive selection. To limit false positive results, only sites detected using both methods were considered (Fig. 1 and Table 2).
Figure 1

Domain representation of positively selected genes.

Sites showing evidence of positive selection are mapped onto the domain representation of the protein. Positions for mammalian and avian genes refer to the human and chicken sequences, respectively (see also Supplementary Table S2) Color codes and domain names are reported.

Among sites showing evidence of positive selection, K14 in mammalian DYX1C1 is located in the CS (or p23) domain, which is involved in the maintenance of folding and in protein-protein interaction36 (Fig. 1). The CS domain of DYX1C1 interacts with Hsp70, Hsp90, an E3 ubiquitin ligase known as CHIP37, as well as with the estrogen receptors (ERα and Erβ)38. In aves, residue F79 in ATP2C2 is located in the cation ATPase_N domain, which is thought to regulate enzyme function39 (Fig. 1). As for FOXP2, residue Q383 is part of a leucine-zipper region flanking the Zinc-finger domain (Znf) of the forkhead box protein P2 (Fig. 1); generally these regions are functionally required for dimerization and transcriptional regulation40. Finally, in NFXL1, which is believed to act as a transcriptional repressor41, two residues showing evidence of positive selection (G687 and P302, in mammals and birds, respectively) are located in the Znf domains, stable finger-like protrusion that make tandem contacts with DNA.

Lineage-specific selection in mammals and birds

We next extended our analysis to explore possible variations in selective pressure across lineages. Specifically, we aimed to assess whether specific branches in the phylogenetic trees evolved under episodic positive selection. Because we did not want to make any a priori assumption about which lineages were more likely to have experienced adaptive evolution, the adaptive branch site-random effects likelihood (aBS-REL) method was applied42. Branches identified with aBS-REL were cross-validated using the branch-site LRT models implemented in codeml43. To be conservative, only branches that were supported by statistical evidence using both methods were considered (Table 3, Figs 1,2 and Supplementary Fig. S1). Positively selected sites for specific lineages were detected using the intersection of the BEB and MEME results.
Table 3

Likelihood ratio test (LRT) statistics for models of variable selective pressure among branches.

MAMMALS
GeneForeground branch (MA versus MA1)a−2lnLbpvalue (FDR corrected)MEME-BEB Sitesc
CNTNAP2, reg1Chiroptera8.9482.778 × 10−3
DCDC2, reg2Alpaca18.5461.658 × 10−5 (1.658 × 10−5)
Dolphin23.5721.204 × 10−6 (2.328 × 10−6)E309, G310
Ruminantia23.0821.552 × 10−6 (2.328 × 10−6)R379, Q417
AVES
GeneForeground branch (MA versus MA1)a−2lnLbp value (FDR corrected)MEME-BEB Sitesd
DCDC2, reg1Pigeon24.9315.941 × 10−7Q135, I140
FOXP2Hoatzin47.6665.054 × 10−12 (1.516 × 10−11)Q377, Q379
Adelie penguin17.5962.732 × 10−5 (4.098 × 10−5)E385
Golden-collared manakin11.7815.985 × 10−4 (5.985 × 10−4)H225
ROBO2Chimney swift15.8536.846 × 10−5 (1.141 × 10−4)
Zebra Finch36.8861.252 × 10−9 (2.504 × 10−9)A814, A815, S816,T817

Notes:

aMA and MA1 are branch-site models that assume four classes of sites: the MA model allows a proportion of codons to have ω ≥ 1 on the foreground branches, whereas the MA1 model does not.

b2ΔlnL: twice the difference of the natural logs of the maximum likelihood of the models being compared.

cPositions refer to the human sequence (see also Supplementary table 2)

dPositions refer to the chicken sequence (see also Supplementary table 2).

Figure 2

Branch-site analysis of positive selection.

aBS-REL analysis for the CNTNAP2 (A) and FOXP2 (B) genes in mammals and birds, respectively. Branch lengths are scaled to the expected number of substitutions per nucleotide. Red: branches that were confirmed to be under episodic positive selection using the codeml branch-site models.

Overall, evidence of episodic positive selection was obtained for few lineages both in the mammalian and in the bird phylogenies. No primate lineage or node resulted to have undergone episodic selection at these genes. Previous data44 indicated different selective pressure at the ROBO1 gene for the Homininae (human-chimpanzee-gorilla) branch; however, the branch-site LRT models provided no statistically significant evidence of episodic selection (nor did aBS-REL). Interestingly, episodic positive selection was detected for the bat branch at the CNTNAP2 gene (Fig. 2 and Table 3). In aves, three lineages, none of them representing vocal-learner species, showed robust evidence of episodic positive selection at FOXP2 (Table 3 and Figs 1,2). Most selected sites in avian FOXP2 are located within or in the vicinity of the leucine-zipper motif (Fig. 1).

Positive selection in humans and great apes

The FOXP2 gene acquired two amino acid substitutions (N303 and S325) after the split of humans from their common ancestor with chimpanzees1445, leading to the suggestion that the two changes might have contributed to the development of human linguistic abilities14. The availability of extensive genetic diversity data for humans and great apes now allows more thorough investigation of the evolution of genes involved in the development of human-specific abilities. Thus, we applied a population genetics-phylogenetics approach to analyze the evolutionary pattern of LI as well as DD genes in the human, chimpanzee, and gorilla lineages. In particular, we applied gammaMap46 that jointly uses intra-species variation and inter-specific diversity to estimate the distribution of selection coefficients (γ) along coding regions. gammaMap envisages 12 classes of γ, ranging from strongly beneficial (γ = 100) to inviable (γ = −500), with γ equal to 0 indicating neutrality. In line with the SLAC results, all genes were found to evolve under some degree of purifying selection (in all cases the median gamma was lower than or equal to – 1) in the three species (Fig. 3). Overall, selection coefficients tended to be lower for gorilla and chimpanzee than for human genes (Fig. 3).
Figure 3

Analysis of selective pressure in the human, chimpanzee and gorilla lineages.

Violin plot of selection coefficients for the three primate lineages (median, white dot; interquartile range, black bar). Selection coefficients (γ) are classified as strongly beneficial (100, 50), moderately beneficial (10, 5), weakly beneficial (1), neutral (0), weakly deleterious (−1), moderately deleterious (−5, −10), strongly deleterious (−50, −100), and inviable (−500).

Analysis of sites showing evidence of positive selection (defined as codons with a posterior probability >0.75 of γ ≥1) confirmed N303 and S325 in human FOXP2. In humans, seven sites were also identified in KIAA0319; all of them are located in the extracellular domain of the protein, with the exception of R13, that is part of the signal peptide. In particular, two sites (E306 and T327) fall within the predicted mucin-type O-glycosilation region47, and two residues (N364 and V765) are located in the PKD domains. These latter play a role in cell-cell adhesion processes48 (Fig. 1). Site 735, showing evidence of positive selection, is polymorphic in humans and corresponds to a low frequency SNP (rs2817191, V735A) (Fig. 1 and Supplementary Table S4). Notably, five sites in DD genes were found to display evidence of positive selection in the gorilla lineage. One of them is within KIAA0319 and is located in the last PKD domain. The positively selected site in the gorilla DYX1C1 gene is located in the above-mentioned CS domain (Fig. 1 and Supplementary Table S4). Finally, in the chimpanzee lineage we detected two sites showing evidence of positive selection in ROBO1, a gene associated to both language and reading phenotypes in human population studies (Fig. 1, Table 1 and Supplementary Table S4)4449.

Selective sweeps in modern humans

We finally investigated whether positive selection acted on LI and DD genes during the recent evolutionary history of human populations. Using the 1000 Genomes Phase 1 data for Yoruba (YRI), Europeans (CEU), and Chinese (CHB), we calculated pairwise FST50, an estimate of population genetic differentiation, and performed the DIND (Derived Intra-allelic Nucleotide Diversity) test51. Statistical significance (in terms of percentile rank) was obtained by deriving empirical distributions. SNPs were considered as positive selection targets if a rank ≥0.99 was obtained for both the FST and DIND tests in the same population. As a confirmatory signature (but not in the initial detection of selection targets), we calculated normalized values for Fay and Wu’s H (DH)52 in sliding windows along the analyzed genomic regions. Four genes displayed signals of positive selection (Fig. 4 and Supplementary Table S5), with some of them showing multiple signatures possibly ensuing from distinct selective events. Several selective sweeps were accounted for by SNPs that reached high derived allele frequency (DAF) in one or more human populations (Supplementary Table S5); in most cases high DAF signals identified through the FST and DIND tests were validated by DH (i.e. the DH value was below the 1st percentile), in line with this statistics having maximum power for high-frequency sweeps52 (Fig. 4 and Supplementary Table S5). One of the selected haplotypes in CNTNAP2 carries a set of variants (rs802567, rs802569, rs802571, and rs802558) in full LD (r2 = 1 in Europeans) with rs802568, which was associated with schizophrenia and bipolar disorder in genome-wide association studies (the ancestral allele increases disease risk)53. A previous population genetics analysis of FOXP2 targets detected two major selection signatures at the CNTNAP2 locus54. Both signals spatially overlap with those we describe in introns 1 and 13. In ROBO1, a cluster of SNPs showing evidence of positive selection surrounds the transcriptional start site of the alternative isoform ROBO1b (Fig. 4).
Figure 4

Location of the selection targets in human populations.

The gene structures of CNTNAP2 (A), FOXP2 (B), ROBO1 (C), and ROBO2 (D) are shown. Candidate selection targets are shown as triangles, with colors indicating the derived allele frequency of each SNP (red: DAF > 0.80, black: DAF < 0.80). Blue rectangles represent genomic windows with a DH value lower than the 1th percentile (see methods for details). The location of variants cataloged as modern-human-specific sites by56 is shown (gray circles). The black dot in (B) represents rs114972925 (see text).

We next investigated whether the selected alleles were already present in archaic hominins. Analysis of ancient DNA samples indicated that both a Denisova55 and an Altai Neandertal56 individuals were homozygous for the ancestral allele at the overwhelming majority (86.5%) of SNPs showing evidence of positive selection (Supplementary Table S5). Specifically, all the selected haplotype blocks include a large proportion of alleles unique to modern humans. We thus conclude that all the selective events we detected occurred after the split of modern humans from extinct hominins. In fact, several variants we identified are included in a catalog of modern-human-specific sites- i.e. positions where the Denisova or Altai Neandertal sequences display the ancestral allele, whereas most (>90%) modern humans carry the derived allele56 (Fig. 4 and Supplementary Table S5). Previous analysis of the FOXP2 gene in Neanderthals indicated that the derived allele at rs114972925 rose to high frequency in modern humans but is absent in archaic hominins (i.e. this variant is a modern-human-specific site)24 (Fig. 4). rs114972925 shows very little LD (r2 < 0.1 in YRI) with the selection targets we identified in the gene and, using the criteria we applied herein, displays no selection signature (its DIND rank is 0.85 in YRI, DAF is 1 in CEU and CHB).

Discussion

In this study, we integrated data from different sources to provide a comprehensive analysis of the evolutionary history of genes involved in disorders of spoken and written language. We also performed an analysis of bird species, as these animals are increasingly recognized as excellent models to study the evolution of speech. In fact, vocal-learning species, both mammalian and avian, share specific behavioral and neuronanatomical features 2. We included ten genes in this study, based on the strength of the evidence relating them to either LI, DD or both. Despite their generally strong functional constraint in both mammals and birds, about half of them were found to have evolved under diversifying selection, this latter targeting a small minority of sites in all genes. As we highlight below, because most of these genes are involved in a number of processes and expressed in a variety of tissues, there is no indication that the sites we identified modulate neurocognitive phenotypes in different species. For instance, in both mammals and aves, LI genes were not specifically targeted by episodic positive selection in vocal learning species. For birds, these data are in agreement with a previous study that searched for convergent accelerated evolution in vocal learners compared to non-learners: none of the genes studied herein was identified31. The authors, though, tested a specific hypothesis, and the analyses were not devised to detect selection at any lineage or node of the avian phylogeny. We used a different approach, as we did not make any a priori assumption. This allowed us to observe significantly higher dN/dS values in FOXP2 for three non-vocal learner bird species. Most sites showing evidence of positive selection were located in the leucine-zipper motif, a region involved in dimerization. In humans, missense mutations in this region impair FOXP2 transcriptional activity and determine a language deficit phenotype57. Nonetheless, because these species lack vocal-learning abilities, it is sensible to conclude that the selective pressure acting on FOXP2 in these birds is unrelated to vocal communication. This underscores the difficulty of relating individual changes, albeit driven by natural selection, to specific traits across species. For mammals, no branch in the phylogeny yielded evidence of episodic selection at FOXP2. Recently, a higher variability of bat FOXP2 genes compared to other mammals was reported; this was suggested to be related to echolocation rather than vocal learning58. In line with previous results58, the branch-site test for the bat lineage was not significant for FOXP2; evidence of episodic selection in Chiroptera was instead detected for CNTNAP2, a direct transcriptional target of FOXP2. Although bats are regarded as a promising candidate species for studies on vocal production and learning59, the results obtained for FOXP2 in birds should caution against drawing any conclusion about the role of CNTNAP2 (and FOXP2) variability in bats and the evolution of echolocation or vocal learning. The branch-site tests we applied did not detect lineage-specific selection at FOXP2 in humans or at ROBO1 in Homininae. The apparent discrepancy with previous findings lies in the different hypotheses tested: whereas we explicitly tested for positive selection, previous works tested for the constancy of the dN/dS ratio among lineages1444. It should also be noted that branch-site tests are robust, but lack power43 Indeed, we used gammaMap to search for linage-specific selection in humans and great apes and we detected selection at human FOXP2. Nonetheless, the analysis of the selective patterns of DD and LI genes in the human and great ape lineages needs cautious interpretation. For the human lineage, several sites showing evidence of positive selection were identified in the KIAA0319 gene, which was repeatedly associated to DD and language abilities606162. Most sites are located in protein regions (the O-glycosilated portion and the PKD domains) potentially involved in cell-cell adhesion and in neuronal migration4860. Moreover, two of the selected sites (N364 and R865) are human-specific, meaning that all other mammals sequenced to date carry the same ancestral residue. These substitutions were already present in Neanderthals and Denisovans, suggesting that any phenotypic change they entailed was shared between modern humans and archaic hominins. Furthermore, sites showing evidence of positive selection were detected in the gorilla and chimpanzee lineages at KIAA0319, as well as at other genes associated to DD (Fig. 1); three of these sites (E739 in KIAA0319, V16 in DYX1C1, and T296 in DCDC2) are specific to gorillas. Because gorillas cannot read, inference on the nature and effect of selection at these genes remains problematic. The identification of the two amino acid substitutions in human FOXP2 fostered a number of experimental studies. Introduction of the two human residues in the orthologous mouse protein was shown to determine changes in learning, behavior, as well as in dendrite morphology and synaptic plasticity of cortico-basal ganglia1563. Along these lines Konopka and coworkers16 showed that human and chimpanzee FOXP2 exert different effects on the transcriptional regulation of neurodevelopmental genes. Thus, despite their being shared with archaic hominins17 and, in the case of the N325 site, with carnivores45, at least one of the two substitutions is clearly functional and may have an effect on neurodevelopment. An interesting possibility is that several human-specific coding and regulatory changes in genes involved in LI and DD, each contributing relatively subtle effects, account for the development of spoken and written language in modern humans. This hypothesis was also proposed by Marcic and coworkers24 upon discovery of a regulatory variant in FOXP2 that is almost fixed in human populations but absent in Neanderthals. We extended the analysis of recent positive selection in human populations to the ten LI and DD genes. Most selective sweeps we detected are at high-frequency in one or more analyzed populations and all of them occurred after the split of modern humans from archaic hominins. We note that available methods that search for positive selection signals have more power for recent events, and the DIND test applied herein makes no exception64. Thus, on one hand, the representation of modern-human-specific alleles among variants detected as selection targets is unsurprising. On the other hand, a number of fixed or almost fixed differences between modern humans and archaic hominins are expected not to be functional and to be due to drift. The combination of selection signals with information from archaic hominin genomes allows the identification of human-specific changes that rose to high frequency through a selective sweep and, therefore, must affect some phenotypic trait, these latter being the targets of natural selection. In fact, one of the signals we identified in CNTNAP2 is in full LD with a protective allele for schizophrenia and bipolar disorder (rs802568). Although this finding does not necessarily imply that selection primarily acted on the affective disorder phenotype, the selected variant/haplotype does modulate a phenotype. In general, the selective pressure responsible for the detected sweeps may be related to traits distinct from LI and DD, and even from cognitive capacities/disabilities in general. In fact, genome-wide association studies have detected variants in FOXP2 associated with traits as diverse as IgG glycosylation and blood pressure6566. Nonetheless, some signals are particularly suggestive of a functional role in cognitive processes. For example, a cluster of selected SNPs in ROBO1 surrounds the transcription start site for the ROBO1b isoform. ROBO1a and ROBO1b were shown to be differentially regulated in fetal human brain areas related to hearing and speech6768. Adding to the relevance of ROBO1 transcriptional regulation, Wang et al. described its specialized expression in songbird vocal motor cortical regions during critical periods for vocal learning69. Thus, the selected variants/haplotypes we identified represent candidate modifiers of LI and DD phenotypes. In this respect, it is worth mentioning that the minor allele frequency of most selected alleles is very low or zero in several human populations. This is especially true for populations of non African ancestry, which are most often analyzed in genetic studies. Thus, association analysis for these variants will require very large subject samples and/or the recruitment of cohorts of African/mixed African ancestry. The study of human distinctive traits such as the use of spoken and written language has received enormous attention in the scientific literature. In this field, evolutionary analyses hold the promise to unveil the genetic determinants of human uniqueness. The FOXP2 case has been epitomal in this respect, highlighting the strengths and weaknesses of evolutionary inference. Data herein extend the analysis to several other genes to generate an overall complex picture, whereby selection signatures are often difficult to relate to specific traits. The selected sites we identified should be regarded as potential modifiers of phenotypic traits, these latter not necessarily related to LI, DD, or other cognitive functions. Experimental analyses will be necessary to address the functional role of the selected changes we report and the phenotype they modulate. The lack of suitable experimental models for the study of human-specific traits, though, will make this task difficult to accomplish.

Methods

Gene selection

We analyzed genes that have been reliably associated to language impairment (LI) and developmental dyslexia (DD) (Table 1), as summarized by Paracchini and by Carrion-Castillo et al.1011. We also included FOXP2, known as the “language gene”12, as well as NFXL1 and ROBO2, that have recently been described as associated to LI or DD7071. Genes that were associate to LI or DD in the context of more complex phenotypes (e.g. FOXP17273) were not included in the study.

Evolutionary analysis in mammals and aves

Mammalian and avian coding sequences were retrieved from the Ensembl (http://www.ensembl.org/index.html) and the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) databases. Species were selected to be representative and to include vocal-learners in both classes; we analyzed a comparable number of species for the Mammalian and Avian phylogenies (Supplementary Table S1). DNA alignments were performed using the RevTrans 2.0 utility74 and checked by the use of trimAl (automated1 mode)75; subsequently, manual editing was used to correct few misalignments in proximity of small gaps . All alignments were screened for the presence of recombination breakpoints using GARD, a program that uses phylogenetic incongruence among segments of a sequence alignment to detect the best-fit number and location of recombination breakpoints76. We estimated the average non-synonymous substitution/synonymous substitution rate (ω) using SLAC (Single Likelihood Ancestor Counting)29. This method was selected because it allows calculation of average dN/dS (and its confidence intervals) while accounting for recombination. We used PAML (Phylogenetic Analysis by Maximum Likelihood) analysis to detected positive selection32. The codeml NSsite models that allow (M8) or disallow (M8a, M7) a class of sites to evolve with ω > 1 were fitted to the data using different codon frequencies model: the F3 × 4 model (codon frequencies estimated from the nucleotide frequencies in the data at each codon site) and the F61 model (frequencies of each of the 61 non-stop codons estimated from the data)32. The total tree length for the genes or gene regions we analyzed ranged from 0.65 to 10.33; these values are within an optimal accuracy range for codeml sites models33. Positively selected sites were identified using two different methods: the Bayes Empirical Bayes (BEB) analysis (with a cutoff of 0.90), which calculates the posterior probability that each codon is from the site class of positive selection (under model M8)33, and the Mixed Effects Model of Evolution (MEME) (with the default cutoff of 0.1)35, which allows the distribution of ω to vary from site to site and from branch to branch at a site. MEME allows the detection of both pervasive and episodic positive selection and has higher power than methods that assume constant dN/dS across lineages35. In order to identify specific branches with a proportion of sites evolving with ω > 1 (i.e. under episodic positive selection), we used aBS-REL, which applies sequential likelihood ratio tests to identify branches under positive selection42. One advantage of BS-REL is that it requires no prior knowledge about which lineages are of interest (i.e. are more likely have experienced episodic diversifying selection). Branches identified using this approach were cross-validated using the branch-site likelihood ratio tests from codeml (the so-called modified model A and model MA1, “test 2”)43. In this test, branches are divided a priori into foreground (those to be analyzed for positive selection) and background lineages, and a likelihood ratio test is applied to compare a model that allows positive selection on the foreground lineages with a model that does not allow such positive selection. An FDR correction was applied to account for multiple hypothesis testing, as previously suggested77. BEB analysis from MA (with a cutoff of 0.90) was used to identify sites that evolved under positive selection on specific lineages. GARD, MEME, SLAC and aBS-REL analyses were performed either through the DataMonkey server78 (http://www.datamonkey.org) or run locally (through the HyPhy suite79).

Population genetics-phylogenetics analysis

We exploited data from the 1000 Genomes Pilot Project (1000G) for Europeans (CEU), Yoruba (YRI), and Chinese plus Japanese (CHBJPT)80. For chimpanzees and gorillas, we used SNP information from 25 and 27 individuals, respectively81. 1000G data were retrieved from the dedicated website (http://www.1000genomes.org/)80. Ancestral sequences were reconstructed by parsimony from the human, chimpanzee, orangutan and macaque sequences. Analyses were performed with gammaMap46, that evaluates intra-specific variation and inter-specific diversity to estimate, along coding regions, the distribution of selection coefficients (γ). In the analysis, we assumed θ (neutral mutation rate per site), k (transitions/transversions ratio), and T (branch length) to vary among genes following log-normal distributions. For each gene we set the neutral frequencies of non-STOP codons (1/61) and the probability that adjacent codons share the same selection coefficient (p = 0.02). For selection coefficients we considered a uniform Dirichlet distribution with the same prior weight for each selection class. For each gene we run 10,000 iterations with thinning interval of 10 iterations. To be conservative, we declared a codon to be targeted by positive selection when the cumulative posterior probability of γ ≥ 1 was > 0.75, as suggested82.

Human population genetics analyses

Genotype information from the Phase 1 of the 1000 Genomes Project were retrieved from the dedicated website (http://www.1000genomes.org/)83. A set of programs developed in C++ using the GeCo++84 and the libsequence85 libraries was used to organize SNP genotypes in a MySQL database, and to analyze them according to a specific genomic region. Genotype information was obtained for the 10 genes; in particular, three human populations with different ancestry were analyzed: Europeans (CEU), Africans (Yoruba ,YRI), and East Asians (Han Chinese in Bejing, CHB). A control set of ~2,000 randomly selected genes was used as a reference set (hereafter referred to as control set). These gene were selected to be longer than 5000 bp and have more than 80% human-outgroup (chimpanzee, orangutan or macaque genomes) aligning bases; orthologous regions in the outgroups were retrieved using the LiftOver tool. The pairwise FST50 and the DIND (Derived Intra-allelic Nucleotide Diversity)51 test were calculated for all SNPs mapping to the analyzed genes, as well as for SNPs mapping to the control set. FST values are not independent from allele frequencies, so we binned variants in 50 classes based on the minor allele frequency (MAF) and calculated FST empirical distribution for each MAF class using the control set data. The same procedure was applied for the DIND test; thus, we calculated statistical significance by obtaining an empirical distribution of DIND values for variants located within control genes; in particular, the DIND test was calculated using a constant number of 40 flanking variants (20 upstream and 20 downstream), as previously described86. DIND values for the three human populations were binned in 100 derived allele frequency (DAF) classes, and for each class the distributions were calculated. As suggested51, for values of iπD = 0 we set the DIND value to the maximum obtained over the corresponding class plus 20. Only SNPs with both FST and DIND with a percentile rank ≥0.99 were considered as selection targets. We also calculated DH52 as a confirmatory signature of positive section in human populations, using an approach based on 5 kb sliding windows moving with a step of 500 bp. Sliding window analyses have an inherent multiple testing problem that is difficult to correct because of the non-independence of windows. In order to partially account for this limitation, we calculated DH also for the control gene set, and the distribution of the statistic was obtained for the corresponding windows. This allowed calculation of the 1th percentile and the identification of regions below this threshold. In order to avoid spurious signals of selection, we evaluated the level of linkage disequilibrium (LD) between selected SNPs in the same population, and we defined a SNP as a positive selection target if it showed strong LD (r2 > 0.80) with at least other two selected SNPs.

Additional Information

How to cite this article: Mozzi, A. et al. The evolutionary history of genes involved in spoken and written language: beyond FOXP2. Sci. Rep. 6, 22157; doi: 10.1038/srep22157 (2016).
  92 in total

1.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.

Authors:  Jianzhi Zhang; Rasmus Nielsen; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2005-08-17       Impact factor: 16.240

2.  p23 and HSP20/alpha-crystallin proteins define a conserved sequence domain present in other eukaryotic protein families.

Authors:  J A Garcia-Ranea; Gladys Mirey; Jacques Camonis; Alfonso Valencia
Journal:  FEBS Lett       Date:  2002-10-09       Impact factor: 4.124

3.  A forkhead-domain gene is mutated in a severe speech and language disorder.

Authors:  C S Lai; S E Fisher; J A Hurst; F Vargha-Khadem; A P Monaco
Journal:  Nature       Date:  2001-10-04       Impact factor: 49.962

4.  Identification and functional characterization of de novo FOXP1 variants provides novel insights into the etiology of neurodevelopmental disorder.

Authors:  Elliot Sollis; Sarah A Graham; Arianna Vino; Henning Froehlich; Maaike Vreeburg; Danai Dimitropoulou; Christian Gilissen; Rolph Pfundt; Gudrun A Rappold; Han G Brunner; Pelagia Deriziotis; Simon E Fisher
Journal:  Hum Mol Genet       Date:  2015-12-08       Impact factor: 6.150

5.  DCDC2, KIAA0319 and CMIP are associated with reading-related traits.

Authors:  Tom S Scerri; Andrew P Morris; Lyn-Louise Buckingham; Dianne F Newbury; Laura L Miller; Anthony P Monaco; Dorothy V M Bishop; Silvia Paracchini
Journal:  Biol Psychiatry       Date:  2011-03-31       Impact factor: 13.382

6.  CNTNAP2 variants affect early language development in the general population.

Authors:  A J O Whitehouse; D V M Bishop; Q W Ang; C E Pennell; S E Fisher
Journal:  Genes Brain Behav       Date:  2011-03-01       Impact factor: 3.449

7.  Great ape genetic diversity and population history.

Authors:  Javier Prado-Martinez; Peter H Sudmant; Jeffrey M Kidd; Heng Li; Joanna L Kelley; Belen Lorente-Galdos; Krishna R Veeramah; August E Woerner; Timothy D O'Connor; Gabriel Santpere; Alexander Cagan; Christoph Theunert; Ferran Casals; Hafid Laayouni; Kasper Munch; Asger Hobolth; Anders E Halager; Maika Malig; Jessica Hernandez-Rodriguez; Irene Hernando-Herraez; Kay Prüfer; Marc Pybus; Laurel Johnstone; Michael Lachmann; Can Alkan; Dorina Twigg; Natalia Petit; Carl Baker; Fereydoun Hormozdiari; Marcos Fernandez-Callejo; Marc Dabad; Michael L Wilson; Laurie Stevison; Cristina Camprubí; Tiago Carvalho; Aurora Ruiz-Herrera; Laura Vives; Marta Mele; Teresa Abello; Ivanela Kondova; Ronald E Bontrop; Anne Pusey; Felix Lankester; John A Kiyang; Richard A Bergl; Elizabeth Lonsdorf; Simon Myers; Mario Ventura; Pascal Gagneux; David Comas; Hans Siegismund; Julie Blanc; Lidia Agueda-Calpena; Marta Gut; Lucinda Fulton; Sarah A Tishkoff; James C Mullikin; Richard K Wilson; Ivo G Gut; Mary Katherine Gonder; Oliver A Ryder; Beatrice H Hahn; Arcadi Navarro; Joshua M Akey; Jaume Bertranpetit; David Reich; Thomas Mailund; Mikkel H Schierup; Christina Hvilsom; Aida M Andrés; Jeffrey D Wall; Carlos D Bustamante; Michael F Hammer; Evan E Eichler; Tomas Marques-Bonet
Journal:  Nature       Date:  2013-07-03       Impact factor: 49.962

8.  The complete genome sequence of a Neanderthal from the Altai Mountains.

Authors:  Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal:  Nature       Date:  2013-12-18       Impact factor: 49.962

9.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

10.  On the antiquity of language: the reinterpretation of Neandertal linguistic capacities and its consequences.

Authors:  Dan Dediu; Stephen C Levinson
Journal:  Front Psychol       Date:  2013-07-05
View more
  14 in total

Review 1.  Empirical approaches to the study of language evolution.

Authors:  W Tecumseh Fitch
Journal:  Psychon Bull Rev       Date:  2017-02

2.  A Diagnostic Marker to Discriminate Childhood Apraxia of Speech From Speech Delay: III. Theoretical Coherence of the Pause Marker with Speech Processing Deficits in Childhood Apraxia of Speech.

Authors:  Lawrence D Shriberg; Edythe A Strand; Marios Fourakis; Kathy J Jakielski; Sheryl D Hall; Heather B Karlsson; Heather L Mabie; Jane L McSweeny; Christie M Tilkens; David L Wilson
Journal:  J Speech Lang Hear Res       Date:  2017-04-14       Impact factor: 2.297

3.  The resistomes of Mycobacteroides abscessus complex and their possible acquisition from horizontal gene transfer.

Authors:  Shay Lee Chong; Joon Liang Tan; Yun Fong Ngeow
Journal:  BMC Genomics       Date:  2022-10-20       Impact factor: 4.547

4.  Discovery of 42 genome-wide significant loci associated with dyslexia.

Authors:  Catherine Doust; Pierre Fontanillas; Else Eising; Scott D Gordon; Zhengjun Wang; Gökberk Alagöz; Barbara Molz; Beate St Pourcain; Clyde Francks; Riccardo E Marioni; Jingjing Zhao; Silvia Paracchini; Joel B Talcott; Anthony P Monaco; John F Stein; Jeffrey R Gruen; Richard K Olson; Erik G Willcutt; John C DeFries; Bruce F Pennington; Shelley D Smith; Margaret J Wright; Nicholas G Martin; Adam Auton; Timothy C Bates; Simon E Fisher; Michelle Luciano
Journal:  Nat Genet       Date:  2022-10-20       Impact factor: 41.307

5.  Epigenomic profiling of primate lymphoblastoid cell lines reveals the evolutionary patterns of epigenetic activities in gene regulatory architectures.

Authors:  Raquel García-Pérez; Paula Esteller-Cucala; David Juan; Tomàs Marquès-Bonet; Glòria Mas; Irene Lobón; Valerio Di Carlo; Meritxell Riera; Martin Kuhlwilm; Arcadi Navarro; Antoine Blancher; Luciano Di Croce; José Luis Gómez-Skarmeta
Journal:  Nat Commun       Date:  2021-05-25       Impact factor: 14.919

6.  The FOXP2-Driven Network in Developmental Disorders and Neurodegeneration.

Authors:  Franz Oswald; Patricia Klöble; André Ruland; David Rosenkranz; Bastian Hinz; Falk Butter; Sanja Ramljak; Ulrich Zechner; Holger Herlyn
Journal:  Front Cell Neurosci       Date:  2017-07-26       Impact factor: 5.505

7.  Self-domestication in Homo sapiens: Insights from comparative genomics.

Authors:  Constantina Theofanopoulou; Simone Gastaldon; Thomas O'Rourke; Bridget D Samuels; Pedro Tiago Martins; Francesco Delogu; Saleh Alamri; Cedric Boeckx
Journal:  PLoS One       Date:  2017-10-18       Impact factor: 3.240

8.  Protein-Protein Interaction Among the FoxP Family Members and their Regulation of Two Target Genes, VLDLR and CNTNAP2 in the Zebra Finch Song System.

Authors:  Ezequiel Mendoza; Constance Scharff
Journal:  Front Mol Neurosci       Date:  2017-05-01       Impact factor: 5.639

9.  Evolution of language: Lessons from the genome.

Authors:  Simon E Fisher
Journal:  Psychon Bull Rev       Date:  2017-02

10.  REST, a master regulator of neurogenesis, evolved under strong positive selection in humans and in non human primates.

Authors:  Alessandra Mozzi; Franca Rosa Guerini; Diego Forni; Andrea Saul Costa; Raffaello Nemni; Francesca Baglio; Monia Cabinio; Stefania Riva; Chiara Pontremoli; Mario Clerici; Manuela Sironi; Rachele Cagliani
Journal:  Sci Rep       Date:  2017-08-25       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.