Literature DB >> 29608725

Homo sapiens-Specific Binding Site Variants within Brain Exclusive Enhancers Are Subject to Accelerated Divergence across Human Population.

Rabail Zehra, Amir Ali Abbasi1.   

Abstract

Empirical assessments of human accelerated noncoding DNA frgaments have delineated presence of many cis-regulatory elements. Enhancers make up an important category of such accelerated cis-regulatory elements that efficiently control the spatiotemporal expression of many developmental genes. Establishing plausible reasons for accelerated enhancer sequence divergence in Homo sapiens has been termed significant in various previously published studies. This acceleration by including closely related primates and archaic human data has the potential to open up evolutionary avenues for deducing present-day brain structure. This study relied on empirically confirmed brain exclusive enhancers to avoid any misjudgments about their regulatory status and categorized among them a subset of enhancers with an exceptionally accelerated rate of lineage specific divergence in humans. In this assorted set, 13 distinct transcription factor binding sites were located that possessed unique existence in humans. Three of 13 such sites belonging to transcription factors SOX2, RUNX1/3, and FOS/JUND possessed single nucleotide variants that made them unique to H. sapiens upon comparisons with Neandertal and Denisovan orthologous sequences. These variants modifying the binding sites in modern human lineage were further substantiated as single nucleotide polymorphisms via exploiting 1000 Genomes Project Phase3 data. Long range haplotype based tests laid out evidence of positive selection to be governing in African population on two of the modern human motif modifying alleles with strongest results for SOX2 binding site. In sum, our study acknowledges acceleration in noncoding regulatory landscape of the genome and highlights functional parts within it to have undergone accelerated divergence in present-day human population.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29608725      PMCID: PMC5952923          DOI: 10.1093/gbe/evy052

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Gene regulation has long been playing a role in fine-tuning the brain circuits that distinguish the highly cognitive human brain from that of comparatively lesser adaptive nonhuman primate brain function (Cáceres et al. 2003). Primate brain evolution displays a disproportionate enlargement of neocortex, frontal lobe and an overall larger brain volume, properties that underpin its intelligent workings (Dunbar and Shultz 2007). Human brain is triple in size and more efficiently adapted to do highly complicated assessments through language and cognitive skills than that of great apes (Geschwind and Rakic 2013). Evidence also suggests that human neocortex possesses a greater volume and significant cell cycle differences that lead to increased corticogenesis (Boyd et al. 2015). At molecular level, little evidence has been uncovered to relate gene sequence change with the phenotypic traits that bifurcate humans and the closest relative chimpanzee into two different strata of intelligence. It is however established that in gene regulation, the spatiotemporal expression of genes plays a defining role in making up the current form of highly adaptive brain of present-day humans (Enard et al. 2002; Cáceres et al. 2003; Gu and Gu 2003). Previous study stated that the human-chimp cerebral cortex relies on a special patterning of gene expression. Out of a gene pool considered in the study, 169 genes were observed to have expressed differently between human and chimpanzee. Among them, 91 genes hinted at being differently expressed in the human lineage alone, with macaque as an outgroup (Cáceres et al. 2003). About 90% of the genes that were differentially expressed in human lineage belonged to brain, whereas in liver and heart, nearly an equal number of genes were upregulated and downregulated between human and chimpanzee (Cáceres et al. 2003). Another analysis sums up the number to 54 prefrontal cortex (PFC) genes having a lineage specific upregulation in human PFC after divergence from other hominoids (Geschwind and Rakic 2013). Recent findings have highlighted that human specific mutations in enhancers can impart huge changes in gene regulatory mechanisms and eventually produce brain size differences (Boyd et al. 2015). Enhancers despite of their proximal existence to promoters of some genes are widely catalogued as also the distal category of cis-regulatory elements, residing many kilobases (kb) away from their target genes; and contribute to gene regulatory networks in terms of initiating cell specific gene expression together with transcription factor (TF) occupancy (Spitz and Furlong 2012; Choukrallah et al. 2015). In mammals, enhancers are either active or primed. Active enhancers possess biochemical signatures of H3K27ac and H3K4me1 and are associated with actively expressing genes whereas primed enhancers possess only the latter methylation mark and are most likely to get activated later on by a developmental or environmental stimulus once a cell has acquired its tissue specific identity (Choukrallah et al. 2015). An enhancer sequence can recruit transcription factors in a variety of ways. TF cooperativity either by direct interaction among the adjacently binding TFs or through indirect cobinding with the cofactor largely determines the transcriptional outcome an enhancer will deliver (Spitz and Furlong 2012). Functional implications of TF binding could be debated as TF binding event does not always imply regulatory control of the nearby genes. Many binding events have been termed nonfunctional and could be due to easier access to chromatin that the TF has occupied or reconfiguration of the nucleosome induced by the binding event for facilitating another TF occupancy leading to gene expression (Spitz and Furlong 2012). Differences in the transcription factor binding sites (TFBSs) between the species within the regulatory sequences can impart huge impact on the regulation of the associated genes. Substitution in intron 8 of FOXP2 gene within the vertebrate conserved POU3F2 binding site in the present-day humans when compared with Neandertals portrayed potential candidacy for driving selective sweep in the entire FOXP2 gene (Maricic et al. 2013). Selective sweep in a population, therefore, confers a genomic region significant where an allele offering a fitness advantage increases in frequency along with other neighboring alleles (linkage disequilibrium). This phenomenon renders the entire locus less diverse (Cadzow et al. 2014). Many of the accelerated portions of the genomes harbor developmental enhancers and genomic changes within them can impart huge alterations in brain function (Prabhakar et al. 2008; Burbano et al. 2012; Hubisz and Pollard 2014). Evolutionary studies have also endorsed acceleration in enhancer sequences compared with coding and noncoding/nonenhancer genomic blocks in vertebrates during land adaptation (Yousaf et al. 2015). A recent study has therefore consolidated this view where human specific changes in a neuro-developmental enhancer of FZD8 gene produced immense differences in the size of the brain (Franchini and Pollard 2015). Necessitating enhancers and their role in predominantly controlling the spatiotemporal expression of the genes, we uncovered sequential changes that rapidly accumulated in human brain enhancers (Maston et al. 2006). For that we devised a strong limiting criterion to include brain specific enhancers that are already functionally confirmed, bringing forth the safety of eliminating any genomic noncoding portions that failed to act as enhancers during functional verifications (Kvon 2015). This criterion is in line with recent studies that have rendered the use of biochemical signatures such as H3K4 monomethylation for enhancer function and prediction useless (Dorighi et al. 2017). Thus, out of our root data set of empirically confirmed, brain specific enhancers, we isolated those enhancers that showed significant signatures of acceleration upon comparison with closest nonhuman primates. By including archaic human data, we also pinpointed human unique TFBSs within these accelerated sequences that have been modified when compared with great apes and within them construed binding sites that are exclusive to H. sapiens. This study is commensurate with data that describes greater percentage of variants within noncoding regulatory genome than coding part of the genome. This work also brings forth patterns of accelerated divergence across present-day human population for SNPs residing in H. sapiens-specific TFBSs, ones which are not shared among the orthologous enhancer archaic and nonhuman primate sequences.

Materials and Methods

Determining Accelerated Cis-Elements within an In Vivo Catalog of Enhancers

We initiated our search for functionally confirmed enhancers by employing an in vivo repertoire of VISTA enhancer browser (Visel et al. 2007). In sum, from an available total of 1,393 elements in VISTA with enhancer activity confirmed in different kinds of tissues, we collected only 271 enhancers that showed endogenous expression profiles exclusively in brain regions (supplementary table 1, Supplementary Material online). Out of the total collected brain enhancers, exclusive subset in which enhancers expressing solely in the forebrain (104), midbrain (55), and hindbrain (38) tissues were placed, the other subset incorporated enhancers expressing in either two (62) or three (12) of the aforementioned brain domains. Orthologous nonhuman primate sequences were collected through UCSC genome browser via BLAT (Kent 2002; Karolchik et al. 2003). We used MAFFT to generate alignments for human and nonhuman primate orthologous enhancer sequences (Katoh et al. 2002). In order to see patterns of enhancer sequence acceleration, we undertook the approach defined by Haygood and coworkers (Haygood et al. 2007). Our analysis carried three-species alignment (human-chimp-macaque), the minimum number of sequences allowed. Initially in the first round, global proxy by employing intron 5 of FHL1 gene was used to first gather a set of “possibly” accelerated enhancers (supplementary table 1, Supplementary Material online). Test statistic of P value with 95% confidence level implies all enhancers to be under positive selection with a value <0.05. P values were corrected for false discovery rate (Q values) for this first round of analysis (Storey and Tibshirani 2003). However, enhancers greater in alignment length than the proxy region intron 5 of FHL1 gene also existed. To address the length parameters that state proxy and target region should at least be equal (Haygood et al. 2007), we applied a bigger 35.4 kb proxy region of intron 1 of FHL1 gene to all of the enhancers with possible selection signals from the first round (supplementary fig. 1 and table 2, Supplementary Material online). Local proxies employing introns of genes residing within a 100 kb distance from the enhancer of interest were then employed to cut down on the number of false positives (supplementary table 3, Supplementary Material online). Noncoding, nonrepetitive, loosely conserved sites (NCNRS) were used as proxies to determine signals of positive selection on enhancers that were bracketed by longer gene deserts.

Assigning Binding Motifs to Accelerated Enhancers

To determine binding motifs in the positively selected enhancers, TRANSFAC was made use of for motifs belonging to a list of 142 carefully inspected TFs. These TFs were confirmed via literature for their role in human brain development. The collected TFs were also in harmony with the enhancer sequences they were being searched in for their expressional search space, that is, all the collected TFs showed endogenous expression profiles in one of the brain domains (MGI: in situ RNA hybridization and Human Protein Atlas) (supplementary table 4, Supplementary Material online) (Uhlén et al. 2015; Blake et al. 2017). All those motifs were noted that showed a significant human unique presence in the enhancer sequences when compared with orthologous nonhuman primate sequences from chimp, gorilla, orangutan, and macaque. For determining H. sapiens-exclusive TF binding motifs, parallel sequences belonging to Neandertals and Denisovans were added to the alignments (Meyer et al. 2012; Prüfer et al. 2014). The H. sapiens-unique binding motifs resulted in single nucleotide variants (SNVs) that distinguished the ancestral binding site from that of the derived binding site in modern humans.

Establishing Selection Regime on SNVs within Unique Sites

To explore population dynamics over the allelic variants among the H. sapiens-unique TFBSs, 1000 Genomes Project Phase3 data was employed to see the trend of natural selection among the human population (Consortium 2015). Unphased VCF files from 1000 Genomes Project were converted to phased haplotype files through fastPHASE (Scheet and Stephens 2006). In order to generate analysis that highlights the segregating alleles to be under the influence of positive selection, extended haplotype homozygosity (EHH) plots and relative EHH (rEHH) score were generated through package “rehh” (version 2.0.0) and Sweep software, respectively (Sabeti et al. 2007; Gautier and Vitalis 2012). Weir and Cockerham Fst values were computed through VCFtools to estimate significantly differentiated SNPs between populations (Danecek et al. 2011). The haplotype range defined had 300 kb region at either ends of the enhancer making up an entire region under consideration to be of approximately 600 kb. Bearing in mind that human populations belonging to different ethnicities hone different adaptive mechanisms because of being exposed to variable climatic differences and changeable adaptive pressures (Tekola-Ayele et al. 2015), we catered to such vast yet delicate regional inconsistencies by dissecting our allelic deductions into regional and worldwide graphical representations. The schematic illustration of the workflow is shown in figure 1.
. 1.

—Schematic display of the carried out steps in the work design.

—Schematic display of the carried out steps in the work design.

Results and Discussion

Identifying Accelerated Enhancers and Binding Motifs within Them

Human accelerated DNA frgaments are those bits of the genome that have experienced frequent sequential changes after the human-chimp split despite being strongly conserved among mammals (Hubisz and Pollard 2014). In vivo analysis of such human accelerated noncoding regions attributed to the presence of cis-regulatory transcriptional enhancers controlling the expression of many developmental genes (Prabhakar et al. 2008; Burbano et al. 2012). As of recent findings, human specific mutations in enhancers have brought to light the massive implications gene regulation can have on brain size and eventually on highly developed brain function in humans (Boyd et al. 2015). We codified a strategy to find out the extent to which these human specific enhancer changes manifest in reshaping human brain circuits, and eventually characterizing H. sapiens as the most successfully thriving members of the genus Homo. To pursue the investigation, we incumbently relied on an empirically verified, in vivo catalog of human brain specific enhancers derived by Visel and colleagues for the root data set of this study (Visel et al. 2007). We conducted prioritized enhancer assortment obtained via transgenic mice assay to maintain reliabilty over ChIP-seq predicted putative enhancers that render a possibility of being eliminated as nonenhancers due to experimental artifacts or dubious nature of TF binding (Kvon 2015). We then set out to construe sequence mutations within these enhancers and the rate at which they have proliferated in the human lineage, upon comparison with the closest relative chimpanzee taking macaque as an outgroup. We employed the approach undertaken by Haygood and coworkers, originally used to expound signals of positive selection on promoter sequences (Haygood et al. 2007). The technique takes target-proxy asscoication based upon branch specific Wong and Neilson test, a phylogenetic, branch specific approach that takes intronic proxy as reference for estimating signals of positive selection in the target enhancer alignment on the foreground branch (Zhang et al. 2005). Unlike contextual search for signals of positive selection in which it is advised to stay within a 100 kb range from the target enhancer sequence to make sure mutation rate does not vary among the intronic proxy and target enhancer regions, our preliminary search for accelerated rate in the candidate enhancer regions undertook “global” proxies (Haygood et al. 2007). Highly conserved among the three aforementioned species, introns 1 and 5 of chromosome X residing housekeeping FHL1 gene were the initial choices. This intronic proxy choice made the screening independent of considering any genomic mutational hot and cold spots and also the chromosomal context (Chuang and Li 2004). This enabled us to narrow down enhancers that possessed a supposedly higher chance of accelerated evolution in the human lineage than the considered nonhuman primate orthologs. Therefore, this approach resulted in 86/271 enhancers, predicted to be evolving at an accelerated rate (fig. 2 and supplementary tables 1 and 2, Supplementary Material online). To determine the extent of false positives, the 86 predicted fast evolving enhancers were subjected to a more rigorous, context based approach in which introns of within 100 kb residing nearby gene were selected to be the locus specific intronic proxies to compare with the enhancer of interest. For enhancers bracketed by longer gene deserts, random, loosely conserved, noncoding, and nonrepetitive sequences were preferred. This stringent criterion curtailed the set of brain exclusive human accelerated enhancers (BE-HAEs) to 15 (fig. 2 and supplementary table 3, Supplementary Material online).
. 2.

—271 Human brain specific VISTA enhancers: Test for positive selection using branch specific Wong and Nielson method with foreground branch human. (a) Y-axis contains P-values. X-axis contains a total of 271 Enhancers. Each enhancer was compared and analyzed with conserved intron 5 of human FHL1 gene. 86/271 enhancers significantly indicated signals of positive selection (enhancers under the bar = P value < 0.05). (b) Previously collected 86 enhancers in (a) were subjected to a robust analysis. Each enhancer was compared and analyzed with a locus specific intronic proxy from a nearby gene. This analysis contracted the previous findings to a number of 15 enhancers that were persistent in showing signals of positive selection (enhancers under the bar = P value < 0.05). (c) The resultant 15 enhancers were checked for human unique TFBSs on comparison with nonhuman primates (chimp, gorilla, macaque, and orangutan). Fifteen corresponding TFBSs were unique to human in nine of the enhancers with signals of positive selection. The asterisk mark on the bars indicates modern human specific variant in the TFBSs.

—271 Human brain specific VISTA enhancers: Test for positive selection using branch specific Wong and Nielson method with foreground branch human. (a) Y-axis contains P-values. X-axis contains a total of 271 Enhancers. Each enhancer was compared and analyzed with conserved intron 5 of human FHL1 gene. 86/271 enhancers significantly indicated signals of positive selection (enhancers under the bar = P value < 0.05). (b) Previously collected 86 enhancers in (a) were subjected to a robust analysis. Each enhancer was compared and analyzed with a locus specific intronic proxy from a nearby gene. This analysis contracted the previous findings to a number of 15 enhancers that were persistent in showing signals of positive selection (enhancers under the bar = P value < 0.05). (c) The resultant 15 enhancers were checked for human unique TFBSs on comparison with nonhuman primates (chimp, gorilla, macaque, and orangutan). Fifteen corresponding TFBSs were unique to human in nine of the enhancers with signals of positive selection. The asterisk mark on the bars indicates modern human specific variant in the TFBSs. To establish in silico the human driven functional modules over the shortlisted 15 BE-HAEs, a list of such TFs were looked for via extensive literature survey that depict a functional role in one or more human brain domains. 142 TFs were obtained and cross checked for their categorized endogenous expression profiles (MGI: RNA in situ hybridization, Human Protein Atlas) to maintain expressional congruence with that of the selected set of brain enhancers (supplementary table 4, Supplementary Material online) (Blake et al. 2017). The corresponding binding profiles of the collected TFs were sought through TRANSFAC, a robust database for eukaryotic transcription factors (Matys et al. 2006). Through initial examination of the binding profiles in BE-HAEs alignments by taking chimpanzee, gorilla, orangutan, and macaque as orthologous comparisons, 13 human unique binding motifs corresponding to 16 transcription factors occurring within 9/15 BE-HAEs came to notice (fig. 2 and table 1). Previously it has been reported that 8% of the human derived mutations in the accelerated regions of the genome are recent, estimated to have arisen in a span of 550–765 Kyr since the divergence of H. sapiens from archaic hominins (Burbano et al. 2012; Prüfer et al. 2014). It is also speculated that coding region mutations shared with archaic humans were followed by substitutions in regulatory elements that were H. sapiens-unique and hence attributed to anatomically profound modern human traits (Prabhakar et al. 2008; Maricic et al. 2013). To determine whether the modified 13 binding motifs in modern H. sapiens diverged after the split from archaic humans, orthologous archaic human sequences (Neandertals and Denisovans) were introduced to the alignments (Meyer et al. 2012; Prüfer et al. 2014). Three such TFBSs were seen to have evolved solely in modern humans for TFs SOX2, RUNX1/3, FOS/JUND within BE-HAEs hs1210 inhabiting H. sapiens-autosome 2 (Hsa2: 66762515–66765088), hs563 (Hsa6: 98491829–98493238), and hs304 (Hsa9: 8095553–8096166), respectively (fig. 3). Remainder shared sites among the three Homo species can be viewed in supplementary figure 2, Supplementary Material online. All of these modified TFBSs had single nucleotide variants (SNVs) within them that differentiated them into human or hominin specific set of TF binding profiles.
Table 1

Human Unique Transcription Factor Binding Sites in a Set of 15 Brain Exclusive Enhancers with Positive Selection Signals

SN IDGRCh37/hg19Brain Domain TF TFBS
1hs37chr16: 54650598–54651882ForebrainPEA3ACWTCCK
2hs1210chr2: 66762515–66765088ForebrainSOX2aNNNANAACAAWGRNN
3hs526chr4: 1613479–1614106ForebrainNF1BCTGGCASGV
POU3F2NWAAYAAW
4hs563chr6: 98491829–98493238HindbrainRUNX1/3aTGTGGT
5hs1366chr6: 38358690–38360084MidbrainTCFAP2BCCCCAGGC
6hs1632chr11: 116521882–116522627MidbrainZIC1VGGGGAGS
7hs1726chr18: 49279374–49281480Hindbrain
8hs1526chr2: 104353933–104357342ForebrainSOX9RNACAAAGGVN
PBX1NYAYMCATCAAWNWNNN
9hs847chr4: 42150091–42151064ForebrainLEF1NWTCAAAGNN
MEF2ATATTTWWANM
10hs540chr13: 71358093–71359507Forebrain
11hs1019chr7: 20838843–20840395Forebrain
12hs192chr3: 180773639–180775802Forebrain
13hs1301chr11: 16423269–16426037Forebrain
14hs430chr19: 30840299–30843536Midbrain
15hs304chr9: 8095553–8096166Mid/ForeFOS/JUNDaTGACTCA/TGACTCAN
NR2F1TGACCTY
NURR1YRRCCTT

Note.—TF, Transcription Factor; TFBS, Transcription factor binding site.

Modern human-specific TFBSs.

. 3.

—Human accelerated enhancers with H. sapiens-unique transcription factor binding sites. (a) Human enhancer hs1210 (shown in brown) was shortlisted to be an enhancer under positive selection when compared with MEIS1 introns with a resultant P value of 0.03. In this figure, an aligned patch within human forebrain enhancer hs1210 has been shown with an existing transcription factor binding site of SOX2. The region also showed a novel substitution within the binding site of SOX2 (TAGACA*ACAATGGAT) in the modern human lineage, unlike the consistent nucleotide observed for archaic humans, primates and nonprimate mammals (TAGACT*ACAATGGAT). (b) Human enhancer hs563 (shown in brown) was shortlisted to be under positive selection when compared with a non-coding non repetitive sequence with a resultant P value of 0.03. In this figure, an aligned patch within human hindbrain enhancer hs563 has been shown with the existing transcription factor binding motif of RUNX1/RUNX3. The region also showed a novel substitution within the binding site of RUNX1/RUNX3 (TGTGGT*) in the modern human lineage, unlike the consistent nucleotide observed for archaic humans, primates and nonprimate mammals (TGTGGG*). (c) Human enhancer hs304 (shown in brown) was shortlisted to be under positive selection when compared with a noncoding non repetitive sequence with a resultant P value of 0.04. In this figure, an aligned patch has been shown with the existing transcription factor binding site of FOS/JUND. The region also showed a novel substitution within the binding site of FOS/JUND (T*GACTCA) in the modern human lineage, unlike the consistent nucleotide observed for archaic humans, primates, and nonprimate mammals (C*GACTCA).

Human Unique Transcription Factor Binding Sites in a Set of 15 Brain Exclusive Enhancers with Positive Selection Signals Note.—TF, Transcription Factor; TFBS, Transcription factor binding site. Modern human-specific TFBSs. Human accelerated enhancers with H. sapiens-unique transcription factor binding sites. (a) Human enhancer hs1210 (shown in brown) was shortlisted to be an enhancer under positive selection when compared with MEIS1 introns with a resultant P value of 0.03. In this figure, an aligned patch within human forebrain enhancer hs1210 has been shown with an existing transcription factor binding site of SOX2. The region also showed a novel substitution within the binding site of SOX2 (TAGACA*ACAATGGAT) in the modern human lineage, unlike the consistent nucleotide observed for archaic humans, primates and nonprimate mammals (TAGACT*ACAATGGAT). (b) Human enhancer hs563 (shown in brown) was shortlisted to be under positive selection when compared with a non-coding non repetitive sequence with a resultant P value of 0.03. In this figure, an aligned patch within human hindbrain enhancer hs563 has been shown with the existing transcription factor binding motif of RUNX1/RUNX3. The region also showed a novel substitution within the binding site of RUNX1/RUNX3 (TGTGGT*) in the modern human lineage, unlike the consistent nucleotide observed for archaic humans, primates and nonprimate mammals (TGTGGG*). (c) Human enhancer hs304 (shown in brown) was shortlisted to be under positive selection when compared with a noncoding non repetitive sequence with a resultant P value of 0.04. In this figure, an aligned patch has been shown with the existing transcription factor binding site of FOS/JUND. The region also showed a novel substitution within the binding site of FOS/JUND (T*GACTCA) in the modern human lineage, unlike the consistent nucleotide observed for archaic humans, primates, and nonprimate mammals (C*GACTCA).

Signatures of Recent Positive Selection on SNVs within Binding Motifs

The three identified H. sapiens-unique single nucleotide variants (SNVs) modifying the binding motifs of SOX2, RUNX1/3, and FOS/JUND were further substantianted as single nucleotide polymorphisms (SNPs), the difference lies in SNPs being at a >1% frequency in a population (Karki et al. 2015). These SNPs corresponding to BE-HAEs hs1210, hs563, and hs304 have dbSNP IDs as rs11897580, rs2498442, and rs6477258, respectively (Sherry et al. 2001). It is understood that a SNP inhabiting a functional domain such as a TFBS can modify the enhancer sequence. The two or more sites that are created as a result might offer variable binding properties to the TFs (original or new TF), eventually creating activity bias for the enhancer they are occupying. However, some plausible outcomes can be expected about TFBS sequence structures that two variants of a SNP are creating, such as 1) the two variable TFBSs can retain the original TF binding property, may be through possible differential affinity, 2) the modified TFBS is impaired enough not to bind the original TF, 3) the altered TFBS can bind both original and new TFs, 4) the altered TFBS can bind only the new TFs, or 5) the altered TFBS altogether loses the ability to bind any TF (Heckmann et al. 2010). As per conclusions, it is established that regulatory control over the genes has a major leverage in human evolution. Moreover, positive selection on such genomic regions that may influence a functional structure is another mainstream driving force to have revamped the current human status (Barreiro et al. 2008; Hussin et al. 2010). To establish selection regime on such SNPs, we referred to 1000 Genomes Project Phase3 data and found derived alleles (TFBS modifying variants in H. sapien lineage) of all three SNPs (rs11897580, rs2498442, and rs6477258) to be occurring near or below the intermediary frequency, that is, 0.5 and hence not fixed in the modern day human populations (table 2). Exploiting the frequency and length of the haplotype with the variant at hand is resourceful in knowing the ongoing selection pattern on that variant and consequently its role in functional adaptation (Sabeti et al. 2002; Nielsen 2005; Voight et al. 2006). In order to see whether the derived alleles of all three SNPs lie in a putatively selected haplotype, we investigated them based upon the work of Sabeti and coworkers (Sabeti et al. 2002, 2007).
Table 2

Derived Allele Frequencies and Weir and Cockerham Fst Values of SNPs within Enhancers hs1210, hs304, and hs563

EnhancerSNPTFBSD/ADerived Allele Frequency
Weir and Cockerham Fsta
aframreursaeaaframreursaea
hs1210rs4452126T/C0.0750.0050.001000.10.006
rs550939004A/T0.090.00140000.150.013
rs11897580SOX2A/T0.130.0060.001000.20.01
hs304rs6477258FOS/JUNDC/T0.280.250.290.340.320.00090.007−0.00030.0060.001
hs563rs2498442RUNX1/3G/T0.520.450.440.620.40.0270.0030.0060.0480.024

D, Derived; A, Ancestral allele; afr, Africa; amr, America; eur, Europe; sa, South Asia; ea, East Asia.

Weir and Cockerham Fst calculated between one population and rest.

Derived Allele Frequencies and Weir and Cockerham Fst Values of SNPs within Enhancers hs1210, hs304, and hs563 D, Derived; A, Ancestral allele; afr, Africa; amr, America; eur, Europe; sa, South Asia; ea, East Asia. Weir and Cockerham Fst calculated between one population and rest. Elucidating BE-HAE hs1210, we observed core haplotype 4 (CH4) to be selected with the highest upstream rEHH value carrying the derived allele of the SNP rs11897580 (T > A) for a 2.5 kb region in Africa (table 3). In the same positively selected haplotype we observed another derived allele of the SNP (dbSNP ID: rs4452126: C > T) inhabiting the same HAE to be cooccuring or hitchhiking with our derived allele of interest. Hitchhiking has a typical signature of linkage disequiblirum with it, that is, the nonrandom association between the beneficial allele under positive selection and the neighboring alleles increases, giving less time to recombination to break the association (Hussin et al. 2010). Hitchhiking effect has been limited to a region as low as 1 kb and less for regions where recombination is high and variation is more (Fay and Wu 2000). Noticeably, both derived alleles exist in >5% of Africans and absent/nearly absent elsewhere (table 2). This makes the speculation that the derived alleles of the SNPs rs11897580 and rs4452126 are hitchhiking in African haplotypes, or have been positiveley coselected for, implying sweep is underway in this region. Furthermore, EHH plots and bifurctaion diagrams constructed for both SNPs indicated that the derived alleles are segregating under the clear influence of positive selection than their respective ancestral counterparts for a region as long as 10.8 kb in Africans (fig. 4). To further confirm, Weir and Cockerham Fst test undertaken indicated that the two SNPs have statistically significant population differentiation between Africans and other samples implying that our allele of interest (SOX2 TFBS modifying allele) is segregating under the influence of positive selection in Africa (table 2).
Table 3

Core Haplotypes with SNP rs11897580 within Enhancer hs1210 with Each Haplotype’s rEHH Score in African Population

Core Haplotype (CH)Hap FreqrEHH (u, d)rEHH P Value (u, d)
CH1C C T T A G370 (0.56)0.04, 0.190.98, 0.56
CH2T C T T A A106 (0.16)1.05, 1.120.59, 0.55
CH3C C A T A A59 (0.09)10.17, 8.760.13, 0.16
CH4C Ta T Aa A A53 (0.08)48.51, 11.950.006, 0.1
CH5C C T T G A40 (0.06)1.62, 0.560.69, 0.92
CH6C C T A A A33 (0.05)4.19, 2.390.2, 0.35
Total = 661

Note.—The table enlists SNPs rs5006732, rs4452126, rs550939004, rs11897580, rs11681729, and rs10865355 in core haplotypes in a region of 2.5 kb. Hap Freq, Haplotype Frequency; u, upstream; d, downstream.

Unique derived variants of SNPs rs4452126 (T) and rs11897580 (A) in CH4.

Bold represent significant results (rEHH-score and rEHH P-value) for the respective haplotype CH4.

. 4.

—EHH plots and bifurcation diagrams of SNPs rs4452126 and rs11897580 belonging to forebrain expressing VISTA enhancer hs1210 in the African population. (a) EHH plot for SNP rs4452126 has a clear demarcation for derived allele T in terms of positive selection. EHH = 1 indicates all haplotypes carrying either ancestral or derived state of the allele are matching upto this point. Bifurcation diagram of the derived variant of the allele confirms the deduction with a clearly long haplotype and absolutely no branching at the nodes upto 10.8 kb region. (b) EHH plot for SOX2 TFBS modifying allele A of SNP rs11897580 also harbors evidence to be selected under positive selection compared with the ancestral allele T for a 10.8 kb region. Bifurcation diagram uncovers little branching at the nodes interpreting for lesser recombination events and hence longer haplotypes for the derived allele compared with the ancestral variant T, especially for a 2.5 kb region [chr2: 66762480–66764997] containing six SNPs (table 3).

Core Haplotypes with SNP rs11897580 within Enhancer hs1210 with Each Haplotype’s rEHH Score in African Population Note.—The table enlists SNPs rs5006732, rs4452126, rs550939004, rs11897580, rs11681729, and rs10865355 in core haplotypes in a region of 2.5 kb. Hap Freq, Haplotype Frequency; u, upstream; d, downstream. Unique derived variants of SNPs rs4452126 (T) and rs11897580 (A) in CH4. Bold represent significant results (rEHH-score and rEHH P-value) for the respective haplotype CH4. —EHH plots and bifurcation diagrams of SNPs rs4452126 and rs11897580 belonging to forebrain expressing VISTA enhancer hs1210 in the African population. (a) EHH plot for SNP rs4452126 has a clear demarcation for derived allele T in terms of positive selection. EHH = 1 indicates all haplotypes carrying either ancestral or derived state of the allele are matching upto this point. Bifurcation diagram of the derived variant of the allele confirms the deduction with a clearly long haplotype and absolutely no branching at the nodes upto 10.8 kb region. (b) EHH plot for SOX2 TFBS modifying allele A of SNP rs11897580 also harbors evidence to be selected under positive selection compared with the ancestral allele T for a 10.8 kb region. Bifurcation diagram uncovers little branching at the nodes interpreting for lesser recombination events and hence longer haplotypes for the derived allele compared with the ancestral variant T, especially for a 2.5 kb region [chr2: 66762480–66764997] containing six SNPs (table 3). To assess for SNP rs2498442 (G > T) lying in BE-HAE hs563, haplotype construction revealed significant downstream rEHH P value for core haplotype 1 (CH1) containing the derived state of the SNP again in Africans (table 4 and supplementary table 5, Supplementary Material online). EHH plots constructed in a region wise manner, also depict positive selection in Africa in terms of greater area coverage indicating longer haplotypes and strong linkage disequiblrium with the derived state when compared with the rest of the regional plots (fig. 5 and supplementary fig. 3a, Supplementary Material online). Global trend however indicates overall positive selection on downstream region for derived allele (supplementary fig. 3c, Supplementary Material online).
Table 4

Core Haplotypes with RUNX1/RUNX3 Binding Site Modifying SNP rs2498442 within VISTA Enhancer hs563 with Each Haplotype’s rEHH Score

Core Haplotypes (CH)Haplotype Frequency
rEHH (u, d)
TotalAmericaEuropeSouth AsiaAfricaEast AsiaAmericaEuropeSouth AsiaAfricaEast Asia
CH1C G G Ta T C T12320.45 (156)0.45 (227)0.62 (303)0.52 (344)0.4 (202)0.4, 0.50.76, 0.540.12, 0.320.3, 1.890.23, 0.63
CH2C A G G A T C8520.34 (118)0.44 (221)0.25 (122)0.27 (179)0.42 (212)1.63, 1.70.76, 1.055.5, 2.132.03, 0.282.07, 0.92
CH3T G T G A C C3440.2 (69)0.11 (55)0.13 (64)0.1 (66)0.18 (90)1.87, 1.315.98, 6.075.46, 4.443.02, 2.272.5, 2.8
CH4C G G G T C C440.01 (4)000.06 (40)06.44, 0.64
CH5C A T G A C C130000.02 (13)05.57, 3.64
Total2492347503489649504

Note.—The table enlists SNPs rs62420423, rs9388046, rs4499937, rs2498442, rs2498443, rs13194250, and rs2503789 in core haplotypes covering a 3.7 kb region. u, Upstream; d, downstream.

Derived allele T of SNP rs2498442 (T).

Bold represents significant rEHH score in African population for the respective haplotype CH1.

. 5.

—EHH plots and bifurcation diagrams for African population depicting SNPs rs2498442 and rs6477258 within VISTA enhancers hs563 and hs304, respectively. (a) SNP rs2498442 within enhancer hs563 expressing in the hindbrain tissue. African Population shows a more pronounced EHH plot with the RUNX1/RUNX3 TFBS modifying derived allele T (shown in green) covering more area under the curve in the downstream region than the ancestral allele G (shown in red). Bifurcation diagram spanning a 10.25 kb region (shown in green) has lesser branching showing lesser recombination events and making of longer haplotypes with the derived allele whereas ancestral allele has relatively more branching and shorter haplotypes in the same region. (b) SNP rs6477258 within enhancer hs304 expressing in the midbrain/forebrain tissue. EHH plot for FOS/JUND TFBS modifying derived allele T (shown in green) indicates greater area coverage in Africa on both sides when compared with the ancestral allele C (shown in red). Corresponding bifurcation diagram for Africa also reveal longer haplotype with lesser recombination events shown as branching at the nodes for TFBS modifying allele T than the ancestral allele C for a 4 kb region.

Core Haplotypes with RUNX1/RUNX3 Binding Site Modifying SNP rs2498442 within VISTA Enhancer hs563 with Each Haplotype’s rEHH Score Note.—The table enlists SNPs rs62420423, rs9388046, rs4499937, rs2498442, rs2498443, rs13194250, and rs2503789 in core haplotypes covering a 3.7 kb region. u, Upstream; d, downstream. Derived allele T of SNP rs2498442 (T). Bold represents significant rEHH score in African population for the respective haplotype CH1. —EHH plots and bifurcation diagrams for African population depicting SNPs rs2498442 and rs6477258 within VISTA enhancers hs563 and hs304, respectively. (a) SNP rs2498442 within enhancer hs563 expressing in the hindbrain tissue. African Population shows a more pronounced EHH plot with the RUNX1/RUNX3 TFBS modifying derived allele T (shown in green) covering more area under the curve in the downstream region than the ancestral allele G (shown in red). Bifurcation diagram spanning a 10.25 kb region (shown in green) has lesser branching showing lesser recombination events and making of longer haplotypes with the derived allele whereas ancestral allele has relatively more branching and shorter haplotypes in the same region. (b) SNP rs6477258 within enhancer hs304 expressing in the midbrain/forebrain tissue. EHH plot for FOS/JUND TFBS modifying derived allele T (shown in green) indicates greater area coverage in Africa on both sides when compared with the ancestral allele C (shown in red). Corresponding bifurcation diagram for Africa also reveal longer haplotype with lesser recombination events shown as branching at the nodes for TFBS modifying allele T than the ancestral allele C for a 4 kb region. For SNP rs6477258 (C > T) inhabiting BE-HAE hs304, no haplotype for any region was reported to have a significant rEHH with either the ancestral or derived state of the SNP. EHH plots created for American, East Asian, and South Asian populations with the SNP rs6477258 were in congruence with the global trend indicating downstream region with the derived state to have greater area under the curve except for European population (supplementary fig. 3b and c, Supplementary Material online). However, African population showed marked deviation in the EHH graph pattern from rest of the populations as well as the global trend, as prominent greater coverage under the curve on both sides of the graph and lesser branching with the derived allele in bifurcation diagram were observed than the counterpart ancestral allele upto a 4 mb region (fig. 5). In sum, our long range haplotype (LRH) based results narrate that derived alleles in BE-HAEs hs1210 and hs563, inhabiting modern human specific binding motifs of SOX2 and RUNX1/3, respectively, are under positive selection in Africa. Since, long range haplotypes persist for shorter time spans, that is,  <30,000 years, we estimate these two modern human specific variations in binding motifs to have undergone recent positive selection in Africans (Barreiro et al. 2008). It is also interesting to note that the transcription factors occupying the H. sapiens-unique binding sites such as SOX2 and RUNX1/3, also maintain a vital role in gene expression especially in the context of neural development. SOX2 is a high mobility group (HMG) box TF characterized to be widely expressed in whole of neural tube, known to keep the progenitor chracateristic of the neural progenitor cells in both mature and developing CNS of humans (Hutton and Pevny 2011; Beccari et al. 2012). Runt related (RUNX) genes comprise of evolutionarliry conserved group of TFs that are mainly responsible for maintaining lineage unique expression of the genes (Stifani and Ma 2009). In mouse CNS, RUNX1 is produced in cholinergic branchial and visceral motor neurons of the hindbrain, whereas RUNX3 expression is confined to peripheral nervous system (Inoue et al. 2008). Therefore, this study concludes that human accelerated divergence among enhancers makes up a strong case for studying brain evolution in present-day humans. It also highlights the significance of regulatory underpinnings in the genome in comparison with other members of genus Homo. Hence, by keeping brain specific regulatory sequence divergence in mind, we can also build basis for enhanced brain function and also regulatory regions’ contribution towards neurodegenerative complications like Parkinson’s and Alzheimer’s disease.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  50 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.

Authors:  Jianzhi Zhang; Rasmus Nielsen; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2005-08-17       Impact factor: 16.240

3.  SOX2 expression levels distinguish between neural progenitor populations of the developing dorsal telencephalon.

Authors:  Scott R Hutton; Larysa H Pevny
Journal:  Dev Biol       Date:  2011-01-21       Impact factor: 3.582

4.  Proteomics. Tissue-based map of the human proteome.

Authors:  Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal:  Science       Date:  2015-01-23       Impact factor: 47.728

5.  Mll3 and Mll4 Facilitate Enhancer RNA Synthesis and Transcription from Promoters Independently of H3K4 Monomethylation.

Authors:  Kristel M Dorighi; Tomek Swigut; Telmo Henriques; Natarajan V Bhanu; Benjamin S Scruggs; Nataliya Nady; Christopher D Still; Benjamin A Garcia; Karen Adelman; Joanna Wysocka
Journal:  Mol Cell       Date:  2017-05-05       Impact factor: 17.970

6.  The complete genome sequence of a Neanderthal from the Altai Mountains.

Authors:  Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal:  Nature       Date:  2013-12-18       Impact factor: 49.962

7.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

8.  Can a few non-coding mutations make a human brain?

Authors:  Lucía F Franchini; Katherine S Pollard
Journal:  Bioessays       Date:  2015-08-25       Impact factor: 4.345

9.  A functional SNP in the regulatory region of the decay-accelerating factor gene associates with extraocular muscle pareses in myasthenia gravis.

Authors:  J M Heckmann; H Uwimpuhwe; R Ballo; M Kaur; V B Bajic; S Prince
Journal:  Genes Immun       Date:  2009-08-13       Impact factor: 2.676

10.  A bioinformatics workflow for detecting signatures of selection in genomic data.

Authors:  Murray Cadzow; James Boocock; Hoang T Nguyen; Phillip Wilcox; Tony R Merriman; Michael A Black
Journal:  Front Genet       Date:  2014-08-26       Impact factor: 4.599

View more
  3 in total

1.  Molecular signatures of selection on the human GLI3 associated central nervous system specific enhancers.

Authors:  Irfan Hussain; Rabail Zehra Raza; Shahid Ali; Muhammad Abrar; Amir Ali Abbasi
Journal:  Dev Genes Evol       Date:  2021-03-02       Impact factor: 0.900

2.  Evolutionary Selection and Constraint on Human Knee Chondrocyte Regulation Impacts Osteoarthritis Risk.

Authors:  Daniel Richard; Zun Liu; Jiaxue Cao; Ata M Kiapour; Jessica Willen; Siddharth Yarlagadda; Evelyn Jagoda; Vijaya B Kolachalama; Jakob T Sieker; Gary H Chang; Pushpanathan Muthuirulan; Mariel Young; Anand Masson; Johannes Konrad; Shayan Hosseinzadeh; David E Maridas; Vicki Rosen; Roman Krawetz; Neil Roach; Terence D Capellini
Journal:  Cell       Date:  2020-03-26       Impact factor: 41.582

3.  An Evolutionary Insight Into the Heterogeneous Severity Pattern of the SARS-CoV-2 Infection.

Authors:  Rabail Zehra Raza; Sumra Wajid Abbasi
Journal:  Front Genet       Date:  2022-03-22       Impact factor: 4.599

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.