Literature DB >> 28671685

Ancient selection for derived alleles at a GDF5 enhancer influencing human growth and osteoarthritis risk.

Terence D Capellini1,2, Hao Chen2, Jiaxue Cao1, Andrew C Doxey3, Ata M Kiapour4, Michael Schoor2, David M Kingsley2,5.   

Abstract

Variants in GDF5 are associated with human arthritis and decreased height, but the causal mutations are still unknown. We surveyed the Gdf5 locus for regulatory regions in transgenic mice and fine-mapped separate enhancers controlling expression in joints versus growing ends of long bones. A large downstream regulatory region contains a novel growth enhancer (GROW1), which is required for normal Gdf5 expression at ends of developing bones and for normal bone lengths in vivo. Human GROW1 contains a common base-pair change that decreases enhancer activity and colocalizes with peaks of positive selection in humans. The derived allele is rare in Africa but common in Eurasia and is found in Neandertals and Denisovans. Our results suggest that an ancient regulatory variant in GROW1 has been repeatedly selected in northern environments and that past selection on growth phenotypes explains the high frequency of a GDF5 haplotype that also increases arthritis susceptibility in many human populations.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28671685      PMCID: PMC6556117          DOI: 10.1038/ng.3911

Source DB:  PubMed          Journal:  Nat Genet        ISSN: 1061-4036            Impact factor:   38.330


INTRODUCTION

Recent genome-wide association studies (GWAS) have mapped many loci associated with complex traits and diseases. While the relative risks conferred by GWAS loci are often small, high frequency of some alleles can nonetheless account for a substantial health burden in human populations[1]. For example, single nucleotide polymorphisms (SNPs) linked to the Growth Differentiation Factor 5 (GDF5) gene are among the best replicated risk factors for adult-onset osteoarthritis (OA)[2]. These alleles elevate OA risk for specific joints by ~1.2 to 1.8 fold[3,4], and are also linked to lumbar disk degeneration[5]; hip dysplasia[6]; Achilles tendonopathies[7]; and meniscus tears[8]. Interestingly, risk variants are present at high frequencies in many populations[3,9]. While the relative risk due to GDF5 variants may be moderate, their high prevalence contributes a significant fraction of OA risk in many human populations[10]. High frequency GDF5 risk alleles show multiple molecular signatures of positive selection during recent human evolution[11-14]. Although the causal basis of selection is still unknown, advantageous changes have presumably occurred at one or more body sites influenced by GDF5 activity. Previous studies have identified multiple structures controlled by GDF5. In mice, Gdf5 null mutations cause shorter feet and limbs; missing joints in digits, wrists, and ankles; altered tendons; missing knee ligaments; and increased risk of OA[15-19]. Likewise, human GDF5 mutations cause short stature; short digits; joint dislocations or fusions; and hip and knee joint dysplasia often presenting with OA[20-23]. Interestingly, two common SNPs in the 5’ untranslated region (5’UTR) of the human gene (rs143383 and rs143384) are associated with a 1.3 to 1.8 fold increased risk of OA in Japan and China[3]. Functional studies show that ‘T’ alleles at these SNPs reduce reporter gene expression in cultured articular cells[3], and are linked with reduced GDF5 transcript levels in patient articular cartilage[24]. Both 5’UTR variants are also associated with height variation in humans[9,25-29]. Homozygosity for T alleles reduce height by approximately 1 cm[9], one of the largest effects identified for GWAS height loci[30]. Reduced height, and increased OA risk, could be due to the 5’UTR mutations, or to other linked variants[9]. Despite high interest in GDF5 5’UTR variants, the candidate genomic regions found in most association studies extend from several kilobases upstream (5 to 100 kb downstream (3) of GDF5 coding exons (Supplementary Fig. 1). To better characterize the overall regulatory structure of GDF5, we surveyed upstream and downstream regions of the human and orthologous mouse locus using transgenic reporter assays and functional rescue experiments in vivo. We find a large region 3’ of GDF5 is required for both normal joint development and bone length. Interestingly, the selection signatures found in humans are centered on a growth enhancer we identify in the 3 region, not the 5UTR. The selection peak includes an ancient genetic variant present in Neandertals, Denisovans, and modern human populations that we show reduces the activity of the growth enhancer. Likewise, deletion of the growth enhancer from mice reduces limb length and Gdf5 expression levels in vivo. We propose that selection on the derived variant in this growth enhancer underlies adaptive evolution of the GDF5 gene in both ancient and modern humans, and has led to the high prevalence of a linked risk haplotype that contributes to common skeletal diseases in many populations.

RESULTS

Regulatory and Functional Scans of Gdf5

To identify regulatory sequences surrounding the mouse Gdf5 locus, we used a Bacterial Artificial Chromosome (BAC) scanning approach[31]. Two BAC clones, each containing functional Gdf5 coding sequences, were modified by inserting a lacZ reporter into the Gdf5 transcription unit, and used to generate transgenic mouse lines (Fig. 1). As we recently reported[32], regulatory elements within the UPSTREAM BAC, spanning 110 kb upstream to 30 kb downstream of Gdf5 exons, drove lacZ expression predominately in embryonic proximal limb joints and axial structures, whereas elements in the DOWNSTREAM BAC, including an additional 110 kb downstream of Gdf5, drove expression in proximal and distal limb joints, including strong digit expression. More detailed analysis revealed that the DOWNSTREAM BAC also drives additional stripes of lacZ expression in the growing ends of long bones rather than joints (Fig. 1). This novel expression specifically resides along a lateral margin of the developing chondrocyte growth plate within and subjacent to the perichondrium, a region where the endogenous Gdf5 gene is also expressed (Fig. 1 and Supplementary Figs. 2 and 3). Mice with Gdf5 null mutations have reduced limb lengths, and this expression is consistent with the known requirement for GDF5 in control of growth plate dynamics[17].
Figure 1.

A regulatory scan of the Gdf5 region.

(a) (Top) Genomic positions of BAC constructs covering UPSTREAM (yellow) and DOWNSTREAM (green) sequences surrounding the mouse Gdf5 locus. Vertical blue lines denote position of lacZ cassette engineered in Gdf5 transcript. (a) (Bottom) The DOWNSTREAM BAC drives a broader lacZ expression pattern in long bones and joints compared to the UPSTREAM BAC, and uniquely controls lacZ expression in growth collars (GC) and broad expression domains in articular (*) surfaces of femur and tibia. Whole embryo panels shown at E15.5 (N=3 per line). Dissected long bones at E16.5 (ventral views). s (shoulder), e (elbow), w (wrist), h (hip), k (knee), a (ankle), d (digit) joints. Scale bars = 2 mm (embryo), 1 mm (bone). (b) Localization of the DOWNSTREAM BAC Gdf5 lacZ signal in the proximal femoral head (top) and proximal tibia (bottom) growth collars at E16.5. Gdf5 lacZ signal is detected in the chondrocyte cell collar, within and subjacent to perichondrial and adjacent tendons cells. Gdf5 lacZ signal is also detected in femoral head and proximal tibia articular cartilage. Scale bars = 200 um (sections).

To test whether different regulatory regions are required for different Gdf5 functions, we bred the BAC transgenes, each carrying functional Gdf5 coding regions, separately onto a Gdf5 null background provided by mouse brachypodism (bp) mutations[16]. Brachypodism mutants have approximately 10% shorter long bones and 50% reduction in metapodial lengths compared to wild-type and heterozygous littermates[15]. The UPSTREAM BAC allele did not rescue long bone growth phenotypes, although it did improve metapodial lengths (Fig. 2 and Supplementary Fig. 4). Conversely, the DOWNSTREAM BAC allele completely restored all major long bones and metapodial lengths to control lengths (Fig. 2 and Supplementary Fig. 4). Therefore, key functional cis-regulatory sequences controlling limb growth reside at least 30 kb downstream of Gdf5.
Figure 2.

DOWNSTREAM BAC rescues long bone and digit growth.

(a) Femoral length is shown as a black line in hindlimb cartoon with femur highlighted in gray. The shortened femur of Gdf5 mutant (bp/bp) mice is rescued to control (bp/+;BAC-) lengths by the presence of the DOWNSTREAM (green)(p = 6.839 × 10−9) but not UPSTREAM (yellow)(p = 0.152) Gdf5 BAC constructs (N=10 skeletal elements per genotype). (b) Digit length measurements indicated for Digit #3 (grey) and Metatarsal #3 (MT3) in hindpaw cartoon. When compared as a ratio to control (bp/+;BAC-) Digit3 or MT3 lengths, DOWNSTREAM BAC sequences fully restored bp/bp Digit3 lengths (p = 3.793 × 10−20) and MT3 lengths (p = 8.339 × 10−20) to control values, while UPSTREAM BAC sequences failed to rescue bp/bp Digit3 lengths (p = 0.102) and MT3 lengths to control levels, although MT3 lengths did significantly improve (p =2.327 × 10−5) (N=10 skeletal elements per genotype). The p-values reflect results obtained from unpaired, two-tailed t-tests between control bp/bp;BAC- and bp/bp;BAC+ samples; see Statistical Methods for further details on plots.

Identification of a Growth Collar Enhancer

To further delineate downstream regulatory sequences, we tested whether smaller human or mouse clones drove consistent lacZ expression in transgenic mice. Different human sequences independently controlled expression in limb joints (41 kb) or growth collars (37 kb) (Fig. 3). The 41 kb human clone drove expression in multiple joints, and contains multiple joint control elements, as recently described[32]. In contrast, the 37 kb human clone drove expression in growth collar domains similar to the DOWNSTREAM BAC (Fig. 1). Growth zone patterns were also found with a smaller ~12 kb mouse sequence identified by a separate scan across the interval (data not shown). Truncation mapping of the orthologous human region revealed a 2.54 kb enhancer (Fig. 3 and Supplementary Fig. 5) that we call GROW1 (GROWTH 1). GROW1 is marked with typical chromatin signatures of enhancers in developing mouse limbs and human chondrocytes (Supplementary Information). Two regions of evolutionary conservation were seen within GROW1, one conserved through amniotes (GROW1A); and one conserved through placental mammals (GROW1B) (Figs. 3 and 4, and Supplementary Fig. 5). Enhancer tests of each sequence showed that both drove long bone expression, although the GROW1B pattern was stronger and extended across the growth collar (Supplementary Fig. 6).
Figure 3.

Fine mapping of a human GDF5 growth collar enhancer.

(a) Indicated regions from human (hg19) GDF5 were tested for ability to drive lacZ expression in developing forelimbs (FL), hindlimbs (HL), and femoral neck and heads. (b) Two fosmid clones, covering 37 kb (dark purple, far left) and 41 kb (light blue, far right) downstream sequence drive expression at E14.5 either in growth-plates or joint stripes, respectively (N=3 per fosmid line). The 37 kb growth collar lacZ pattern matches that driven by a 2.5 kb sequence (red; GROW1) at E15.5. GROW1 contains at least one smaller 980 bp sequence (yellow, GROW1B), capable of driving expression in growth plates (also see Supplementary Figs. 5 and 6). The 41 kb sequence drives strong expression in distal joints, e.g., knee (k), elbow (e), and digital joints (d) and contains three previously identified joint enhancers (gray; R3, R4, and R5)[32]. Scale bars = 1 mm (limbs). (c) When compared to the DOWNSTREAM BAC (green, left), the 37 kb growth region construct (purple) as well as the 2.5 kb (red) and 980 bp (yellow) constructs drive restricted patterns in the proximal femoral neck and shaft, whereas the 41 kb joint region construct (blue) drives expression confined to the femoral head (N > 5 per construct). An asterisk (*) indicates that the trochanter has not yet fully formed at E14.5. Scale bars = 500 um (proximal femora).

Figure 4.

A functional variant in the human GDF5 growth plate enhancer.

(a) Position of rs4911178 within a 30 vertebrate Multiz alignment (UCSC browser, hg19). A common derived human variant “A” (blue) occurs at an otherwise highly conserved ancestral “G” (red) base in the GROW1B region. (b) In vivo lacZ reporter assay of ancestral “G” and derived “A” GROW1B enhancers (with constructs shown below dashed line in (a)) that differ only at the rs4911178 position. GROW1B with the derived “A” variant maintains facial expression but drives lower lacZ expression in long bone growth collars. Specifically, for the derived enhancer weak growth collar expression was observed in 2 of 11 independent transgenic embryos with facial expression, whereas for the ancestral enhancer growth collar expression was observed in 6 of 9 independent transgenic embryos with facial expression. Hindlimb view highlights the marked difference between both constructs. Scale bars = 2 mm (embryo), 1 mm (hindlimb). (c) In vitro analyses of GROW1B enhancer luciferase activity in Chon-002 cells. The derived “A” GROW1B variant (blue) drives significantly lower expression than the ancestral “G” GROW1B variant (red) (p =7.7 × 10−10). The p-values from independent experiments were combined across seven experiments using Fisher’s combined probability test. See Statistical Methods for further details on plots.

Human Sequence Variants in GROW1

Given the association of GDF5 with height variation and OA risk, we queried 1000 Genomes Project data[33] to identify common human single nucleotide polymorphisms (SNPs) in a 400 kb region surrounding GDF5 (Supplementary Table 1). Consistent with other studies[3,9,14,34], we found no common protein-coding substitutions that could explain the associations (Supplementary Table 1). Instead, we identified many non-coding variants, nineteen of which exist within the newly mapped GROW1 enhancer. Of these, 14 are rare (MAF <.01), 4 are low frequency (MAF <.03), and 1 is a common variant (MAF >.05)(Supplementary Table 1). The common variant is a guanine (G) to adenine (A) substitution (rs4911178) mapping within a highly conserved sequence within GROW1B (Supplementary Fig. 5). Strikingly, this “G” is perfectly conserved in placental mammals, whereas “A” occurs at high frequencies in some human populations (Figs. 4 and 5A).
Figure 5.

rs4911178 global allele frequencies and signatures of selection in the GROW1B region.

(a) The derived “A” (blue) allele at rs4911178 is present at high frequencies in many out-of-African populations compared to the ancestral “G” (red) allele (geographic distribution from Human Genome Diversity Project Selection Browser[57]. (b) CMS[37] selection score in JPT+CHB populations. The composite of multiple signals of positive selection is centered on the rs4911178/GROW1B region. See Methods for further details on plots.

To further examine the functional consequences of this substitution, we recreated the rs4911178 base pair change in GROW1B expression constructs, and compared lacZ expression patterns in transgenic embryos (Fig. 4B). Strikingly, the derived “A” variant significantly reduced the expression of the growth enhancer in long bones, but had no observable effect on facial expression, which served as an internal control. We also cloned each GROW1B variant upstream of a luciferase gene, and introduced constructs into a human fetal femoral growth-plate chondrocyte cell line (Fig. 4C and Supplementary Fig. 7). The ancestral GROW1B sequence drove strong luciferase expression compared to control vectors, confirming its regulatory activity in human cells. The derived GROW1B enhancer variant drove significantly lower expression than the ancestral variant (0.43–0.82 fold; 500ng average 0.72; N=7; p =7.7 × 10−10) confirming that the human G to A substitution reduces activity of the GDF5 enhancer in human growth collar chondrocytes (see Supplementary Methods).

Distribution of GDF5 Variants in Modern and Ancient Humans

We next explored the relationship of the rs4911178 variant to geographic patterns, trait associations, and signatures of selection in humans (Figs. 5 and 6 and Supplementary Information). The lower activity A variant of GROW1 occurs at high frequencies in Eurasians but low frequencies in Africans (Figs. 5A and Supplementary Table 1). Within Eurasians, the A allele is found near the center of a ~130 kb haplotype that extends from the GDF5 5’UTR through the downstream region that controls limb length in mice (Supplementary Figs. 8 and 9). SNPs within this region show high linkage disequilibrium with each other (D’ >0.98; Supplementary Table 2), including tight linkage between rs4911178 and other SNPS previously associated with increased OA risk (rs143383, r2=0.92), and decreased height in human populations (rs143384, r2= 0.88; rs6088813, r2= 1; rs725908, r2= 1; rs6060373, r2= 0.96; rs6088792, r2= 0.64; rs6060369, r2= 0.96) (Supplementary Fig. 10). Although rs4911178 has not been genotyped in all GWAS studies, several reports confirm that the derived “A” allele is significantly associated with shorter height (Supplementary Information).
Figure 6.

Evolutionary history of the GDF5 locus in humans.

(a) Maximum Likelihood tree (left) and visual genotype (right) of phased 1000G, Neandertal, Denisovan, Chimpanzee haplotypes. (left) two clades (A & B) with strong bootstrap support exist, both containing haplotypes from Europe (black), Asia (orange), and Africa (green). Within Clade B, a high frequency haplotype B* occurs in Eurasian individuals. (right) Visual genotype across the 130 kb GDF5 regulatory locus. Clades A and B are partitioned based on the variant at rs4911178. Ancestral (red) or derived (blue) states are shown for SNPs across the interval. Thin line demarcates Clade A and B haplotypes; dashed line demarcates B and B*. Within Clade B, Neandertal and Denisovan haplotypes are most related to haplotypes only in Africa, and are ancestral to the “short height” B* haplotype found in Eurasians. (b) Three models of haplotype evolution at GDF5 locus. Blue X, shared mutations; black X, independent mutations. (c) Multiple GDF5 haplotypes were present in middle-to-late Pleistocene Africa, including haplotype B harboring the derived lower activity growth variant. Neandertal and Denisovan ancestors carried this or a related “shorter-height” haplotype into northern latitudes (dashed lines). During late Pleistocene, haplotypes (B*) related to (B) arose and were positively selected during more recent out-of-Africa migration(s). B* haplotype leads to decreased height via the derived rs4911178 variant in GROW1B (yellow). This variant travels with linked 5’UTR functional variants (e.g., rs143384)[3] as well as variants within adjacent R2 (light purple) and downstream R3, R4, and R5 joint enhancers (light green)[32].

To further examine the evolutionary history of GDF5, we used phased 1000 Genomes data to analyze 1486 haplotypes spanning the 130 kb linkage block (Fig. 6A and Supplementary Table 3). Removing low frequency, putative recombinant haplotypes resulted in 1091 haplotypes used to construct a Maximum Likelihood phylogenetic tree. This tree reveals two well-supported clades differing at rs4911178 (Fig. 6A and Supplementary Fig. 11; Supplementary Tables 4 and 5). Phylogenetic analyses using all haplotypes yielded similar results (Supplementary Figs. 12 and 13). Clade A, consisting exclusively of haplotypes harboring the ancestral allele at rs4911178 (398/398), shows marked diversity within and between continents, and is the most frequent haplogroup in Africans (145/202; 71.8%). Conversely, Clade B consists of many closely related haplotypes that all possess the lower activity derived growth variant at rs4911178 (693/693). This clade shows less sequence diversity, and is the predominant haplogroup in Eurasians (i.e., 636/889; 71.5%) (Supplementary Table 4). A subgroup of haplotypes within the B clade, which we denote B*, is particularly common in Eurasians, but also found at low levels in Kenya (1/202; 0.5% African haplotypes). Analysis of this African individual with Ancestry Informative Markers shows exceptionally low to no detectable evidence of admixture with Europeans[35,36] (data not shown). Additional Africans (Luhyan and Yoruban) harboring B* haplotypes exist in the full dataset, including low frequency haplotypes (N=5) (Supplementary Figs. 12 and 13). Interestingly, other Luhyans and Yorubans have haplotypes that show many but not all of the derived SNPs characteristic of B* (Fig. 6A). These are candidate proto-haplotypes that could have given rise to the B* haplotype. The high differentiation of GDF5 allele frequencies between populations; and the prevalence, size, and low sequence diversity of the B* haplotype in non-Africans, all suggest that B* has been subject to positive selection in Eurasians[1114]. The Composite of Multiple Signals (CMS) method combines multiple selection signatures to map base pair changes that have most likely served as the causal basis of positive selection[37]. When applied to GDF5, CMS shows a selection peak lying far downstream of GDF5 coding exons, in the region where we have mapped GROW1B (Fig. 5 and Supplementary Fig. 14). Notably, rs4911178 is one of the top three highest scoring SNPs in the entire 130 kb region. The two other top scoring SNPs reside in sequences that did not show reproducible enhancer activity in transgenic mice (Supplementary Fig. 5; data not shown). Our studies thus suggest that the noncoding change at rs4911178 provides the most likely molecular basis for recent strong selection of GDF5 in non-Africans. While modern Eurasian populations arose from out-of-African migrations beginning 130,000–50,000 years ago, Neandertals and Denisovans are descendent from an older migration out of Africa that occurred >600,000 years ago. We analyzed Neandertal[38,39] and Denisovan[40] sequences to reconstruct GDF5 variants present in extinct hominins (Supplementary Table 3). Like many Eurasians, Neandertals and Denisovans show the lower activity A allele of GROW1B, rather than the ancestral G allele highly conserved in other mammals (Supplementary Table 3). Phylogenetic analysis shows that Neandertal and Denisovan haplotypes cluster within Clade B with haplotypes present in Yorubans and Luhyans of Africa, consistent with sequence divergence comparisons (Fig. 6A, and Supplementary Table 6). Notably, this Clade B subgroup falls outside the B* cluster that is prevalent in Eurasians, and does not share the same derived SNP changes in the GDF5 5’UTR (e.g., rs143384) that have been identified in OA association studies. These data suggest that an ancient low growth variant was present in the human lineage before the last common ancestor of Neandertals, Denisovans and modern humans, and has become the predominant GDF5 variant in both archaic and modern Eurasians.

Functional Consequences of GROW1 Inactivation

To further examine the phenotypic consequences of changes in the Gdf5 growth enhancer, we used CRISPR/Cas9 editing[41] to delete the GROW1 enhancer from the mouse locus (see Supplementary Information). GROW1 deletion mice are missing a 1.8 kb region, including the mouse orthologue of the rs4911178 site and highly conserved surrounding sequence. (Like other mammals, all sequenced wild-type laboratory strains of mice carry a G at the rs4911178 position[42], similar to the human ancestral sequence common in Africans). GROW1 deletion mice were viable, fertile, and born at expected Mendelian ratios (see Supplementary Information). To look for possible effects of enhancer loss on gene regulation, we generated F1 hybrid mice heterozygous for both GROW1 deletion and control alleles, and compared the levels of expression of surrounding genes in RNA samples prepared from the growing ends of long bones. Loss of GROW1 significantly reduced the expression of the linked Gdf5 allele, but not the linked upstream Cep250 or downstream Uqcc1 alleles (deletion allele Gdf5 expression 85±1.48% of control allele expression, p=0.02857; deletion allele Cep250 expression 108±1.78% of control allele expression, p=0.2; deletion allele Uqcc1 expression 88±1.29% of control allele expression, p=0.6286; Permutation Test). These results confirm that GROW1 is a cis-acting regulatory enhancer required for normal levels of Gdf5 expression at ends of bones. To test for additional phenotypic consequences of the GROW1 deletion, we analyzed bones of GROW1−/− and control mice with micro-computerized tomography (uCT) (Fig. 7). The overall lengths of the femoral neck, femur, and tibia were shorter by ~7.5% (p=0.007), 2.5% (p=0.02), and 2% (p=0.01) in mutant animals, respectively. GROW1 is thus required for normal long bone growth, and controls phenotypes that could contribute to both height variation and fracture risk in humans (see Discussion).
Figure 7.

GROW1 regulates long bone length in vivo.

(a) Morphometric measurements on P30 femur (left) and tibia (right) samples that underwent μCT (N=12 skeletal elements per genotype). Red lines indicate the following measurements: Femoral Neck Length (A), Femoral Length (B), Femoral Head Height (C), and Tibial Length (D). Yellow dashed circle outlines maximum diameter of the femoral head to specify its anatomical center; white dotted lines depict planes used to accurately measure length. See Methods. (b) Comparison of normalized measurements for GROW1 mutant and GROW1 (control) mice. Compared to GROW1 littermates (dark boxes), GROW1 mice (light gray boxes) have significantly shorter femoral neck lengths (p=0.007), femoral lengths (p=0.02), femoral head heights (p=0.02), and tibial lengths (p=0.01). The p-values reflect results obtained from unpaired, two-tailed t-tests. See Statistical Methods for further details on plots.

DISCUSSION

Prior to our studies, the most studied functional SNPs in the human GDF5 gene have been rs143383 and rs143884 in the 5’UTR. Changes at these positions reduce gene expression in cultured cells[3], and have been proposed as the likely basis of OA and height associations in humans[9,14]. Because they are located within the mature GDF5 transcript, changes in the 5’UTR would likely affect most GDF5 functions. In contrast, our studies show that GDF5 is controlled by a distributed set of regulatory elements that allow separable regulation in joints and the growing ends of bones (Figs. 1 and 3). Several different enhancers are required to drive GDF5 expression in different joints in the skeleton[32]. In addition, the 3’ region also contains the novel GROW1 enhancer characterized here, which is expressed in growing ends of long bones, rather than joints between bones (Fig. 3). BAC rescue experiments show that the large 3’ region, containing GROW1, is required for restoration of normal bone lengths in Gdf5 mutants (Fig. 2 and Supplementary Fig. 4). Targeted deletion of the GROW1 enhancer also reduces Gdf5 expression and length of long bones in mice, confirming that this enhancer functions as a growth controlling sequence in vivo (Fig. 7). Interestingly, the orthologous GROW1 sequence in humans harbors a common derived SNP (rs4911178) at an otherwise highly conserved position. This change alters a predicted binding site for PITX1, a quintessential regulator of hindlimb growth and patterning in many animals[43] (Supplementary Fig. 15 and Supplementary Table 7). Experimental assays show that the derived sequence change reduces PITX1 binding interactions in vitro (Supplementary Fig. 15 and Supplementary Table 7), and reduces GROW1 enhancer activity in both transgenic mice and in human growth plate chondrocytes (Fig. 4). We therefore propose that rs4911178 is a causal variant contributing to height variation in humans. Interestingly, haplotypes containing the derived GROW1 variant show clear evidence of positive selection in Eurasians[11-14]. Most recently, the CMS measure[37] has narrowed the region of positive selection, and highlights rs4911178 as one of the most likely SNPs responsible for positive selection at GDF5 (Fig. 5). More than 90% of adaptive regions narrowed by CMS scores map to non-coding regulatory regions of genes[37], though most causal regulatory variants are still unknown. Our studies add an interesting new example of a human growth regulatory variant that has been positively selected in modern Eurasians. Unlike the small handful of previously known regulatory changes controlling lactose tolerance[44], eye color[45], or hair color[46], we find the same regulatory change in both archaic and modern Eurasians, suggesting a long-standing role of GDF5 in modifying human growth and morphology. Because regulatory mutations typically only alter a subset of normal gene functions, finding signatures of positive selection centered on a particular enhancer can provide useful clues about the likely biological basis of past selection. GDF5 clearly functions in bone growth, as well as joint, tendon, and ligament formation, based on multiple alterations in GDF5 coding mutants[15-18]. In contrast, the selected rs4911178 variant occurs in a non-coding growth collar enhancer, suggesting that specific changes in bone growth rather than joint, tendon or ligament formation, have been the biological basis of positive selection in humans. The possible selective advantage of decreased bone growth in Eurasians is still unknown, but could include positive assortative mating, sexual selection, energetic control[14], thermoregulation[47,48], or decreased hip-fracture risk[10]. Limb proportions in many endothermic animals follow a classic geographic pattern called Allen’s rule, in which species at higher, colder latitudes evolve shorter distal appendages than closely related species at lower, warmer latitudes[47]. Shorter appendages conserve body heat and lessen risk of frostbite on extremities in colder environments[47]. Many studies indicate that cold adaptation has also influenced limb proportions in both modern humans and Neandertals[49,50]. Mouse Gdf5 mutations are known to disproportionately affect limb and foot length while maintaining size of the axial skeleton[15,16]. Similarly in humans, the common GDF5 haplotype affects standing height but not sitting/trunk height[28], and is also associated with smaller foot size[51]. The prominent effects of the Gdf5 gene on limb growth, and the strong signatures of selection seen over GROW1 in Eurasians, suggest that regulatory changes in this gene may have contributed to classic Allen’s Rule patterning of limb proportions in humans. We note, however, that derived GDF5 allele frequencies do not show simple latitudinal clines in extant modern populations (Fig. 5), and other factors likely contribute. Interestingly, our mouse knockout studies show that GROW1 has a significant effect on the length of the femoral neck in addition to overall bone length (Fig. 7). Femoral neck length is a key component of hip-axis length with longer hip-axis lengths associated with increased hip fracture rates in humans[52]. The derived allele at the GROW1 enhancer may explain the reduced femoral neck length and reduced fracture risk previously seen in human GDF5 association studies[10], and provides another possible selective advantage in humans migrating to particular environments. It is striking that the same GROW1B mutation is found in Neandertals, Denisovans, and modern Eurasians. Similar mutations in different populations can arise through independent mutations occurring at the same DNA position, introgression of alleles from one population to another, or independent selection of an ancestral variant already present in the common ancestor of different lineages (Fig. 6B). For GDF5, we can effectively rule out independent mutations because many flanking SNPs also show identical changes across a larger linked region, indicative of a shared haplotype rather than a recurrent mutation at rs4911178. Interestingly, the shared haplotype does not include rs143383 and rs143384 in the GDF5 5’UTR, again highlighting the importance of the 3’ flanking region, and not the 5’UTR, in the biological functions likely selected at northern latitudes in archaic and modern humans. Introgression of multiple genomic regions from Neandertals and Denisovans to modern humans is strongly supported by recent studies[38-40,53-55]. However, independent selection of ancestral standing variants can provide an alternative mechanism for allele sharing. In this scenario, ancient variants are already present in a common ancestral population, with the same preexisting variants independently selected in subsequent lineages. Our sequence analysis of the GDF5 locus suggests that haplotype sharing by archaic and modern humans has most likely arisen through this mechanism. Both A and B haplotype clades are present in modern Africans, including individuals with little evidence of recent admixture. The Neandertal and Denisovan GDF5 haplotype group fall within the B clade, but represents a long branch within this group, with multiple sequence differences that distinguish B and other B* haplotypes (Fig. 6). Previously studied cases of introgression from Neandertal to modern humans typically show few base pair differences between the Neandertal and modern alleles (e.g., zero differences for several Neandertal HLA Class I alleles)[53]. In contrast, we find 0.04% divergence between the Neandertal B and modern human B* haplotypes, consistent with an extended period of sequence divergence within the B haplogroup (Supplementary Table 6). In addition, the closest nucleotide similarities between Neandertal and Denisovans and modern humans are found in Africa rather than Eurasia, the opposite of the geographic pattern expected if archaic alleles had introgressed into modern populations migrating to northern latitudes. The high frequency of GDF5 B* haplotypes in modern Eurasians and the occurrence of long branch B haplotype in Neandertals and Denisovans thus most likely represent separate selection events, each of which relied on ancient genetic variation present in Africa, but occurred during separate out-of-Africa migrations (Fig. 6C). An important consequence of positive selection on the GDF5 locus is that the B* haplotype is now present in billions of people worldwide. This has significant human health implications, because the selected B* haplotype is associated with a roughly 1.2–1.8 fold increased risk of OA, one of the largest and most replicated genetic associations found for this prevalent form of human joint disease[2-4]. Although past positive selection for a disease-causing haplotype may seem paradoxical, OA typically develops at post-reproductive ages. Because evolutionary fitness requires successful reproduction, alleles that confer benefits at young or reproductive ages may be positively selected in populations, even if they have some deleterious consequences at post-reproductive ages[56]. The particular variants within B* haplotypes that cause increased OA susceptibility are still not clear. The 5’UTR rs143383 variant reduces reporter gene expression in cultured cells[3], and is also associated with lower expression of linked genes in patient articular cartilage[24]. However, the B* haplotype extends over a ~130 kb region that also includes large and distributed system of regulatory enhancers required for normal expression and function of GDF5 in both joints[32] and growing bones (Fig. 6A). Genetic variants throughout this haplotype travel in linkage disequilibrium with the selected GROW1 variant rs4911178, and could contribute to changes in OA susceptibility. Interestingly, previous studies suggest that the risk of GDF5-associated arthritis differs in hip, knee, wrist, and finger joints[4]. Some of this anatomical specificity may be related to genetic changes in different joint-specific enhancers, a possibility made more likely by the multiple GDF5 control regions controlling expression in different joints found in the proximal and distal regions of the developing limb[32]. Future studies can focus on testing additional human sequence variants within these proximal and distal joint enhancers that may contribute to the very widespread health and morphological effects of the GDF5 locus in modern populations.

ONLINE METHODS

BAC, Fosmid, and Hsp68 LacZ Plasmid Constructs

The 140 kb UPSTREAM BAC (306G24) and 200 kb DOWNSTREAM BAC (RP23–316K12), spanning the Gdf5 region, were isolated from mouse libraries and modified by insertion of an internal ribosome entry site-beta galactosidase reporter cassette (pIRES-B-Geo) in the 3’UTR of the last exon of Gdf532. The size and orientation of the BAC inserts, as well as the sequence of the intact Gdf5 coding exons and targeted 3’UTR regions, were verified using restriction mapping, pulse-field gel electrophoresis, Southern analysis, and Sanger sequencing. Two fosmids, ABC8–2134240-H3 and ABC12–46947700-I24, generated by the Human Genome Structural Variation Project[58], were chosen based on their overlap with conserved non-coding regions downstream of the UPSTREAM BAC (see Supplementary Table 8). Fosmid ABC8–2134240-H3, ~37 kb in size, harbors an ancestral human haplotype, as determined by sequencing, and comes from a Yoruban individual. ABC12–46947700-I24, ~41 kb in size, also harbors an ancestral haplotype but comes from a CEPH individual heterozygous at this locus. Each fosmid bacterial stab was streaked out on chloramphenicol plates and individual colonies were picked and grown in LB plus chloramphenicol. Following purification using Nucleobond Xtra Maxi Plus kits (Macherey-Nagel #740416.10), each fosmid was linearized using the NotI restriction enzyme (New England Biolabs #R0189L), purified via three rounds of micro-injection buffer exchange using Centriprep-30 concentrators (Amicon #4306), and run on a CHEF gel overnight. Successfully digested clones were then SNP- and end-sequence verified, and passed through Centrex MF-15 filters (Schleicher and Schnell # 10467004). Finally, each prepared fosmid was co-injected and co-integrated with a NotI linearized hsp68 basal promoter-lacZ reporter vector. Evolutionarily conserved non-coding regions tested for enhancer activity were either PCR amplified using primers containing appropriate linker sites from C57BL/6 mouse genomic or human fosmid DNA, or synthesized by GenScript. Each individual region was cloned into a modified (see below) or unmodified p5’NotI hsp68 lacZ expression vector containing a minimal heat shock promoter and a lacZ cassette. Supplementary Table 8 lists all the constructs used in this study, along with their primers, and source DNA. For concatenated constructs, primers containing SfiI linker sequences were used. Their PCR products were ligated briefly to form tandem copies before cloning into a modified p5’NotI hsp68 lacZ vector containing a SfiI restriction site that was inserted between two NotI sites upstream of the lacZ cassette. Ligates were next transformed into DH10 cells, streaked on LB ampicillin plates, single colonies were then picked, PCR screened, sequenced, and grown in the presence of ampicillin using standard Maxi-Prep kits (Qiagen). Finally, successful clones were processed for micro-injection as described below.

Transgenic Mice

Transgenic mice were generated by pronuclear injections carried out by the Stanford Transgenic Facility or by Taconic/Xenogen Biosciences Inc. All constructs were linearized using NotI and then purified for micro-injection as described above. Constructs were next micro-injected into derived FVB or C57BL6/CBAF1 oocytes. Founder transgenic embryos were collected at E14.5 or E15.5 and X-gal stained. For each transgene, multiple embryos derived from independent injections were compared for their expression patterns and only reproducible patterns were counted as significant. Stable transgenic mouse lines were also generated for both BAC clones (N=3 for UPSTREAM BAC; N=5 for DOWNSTREAM BAC). For each stable line, timed matings were established and the resultant progeny were X-gal stained and then directly compared to results from multiple independent founder transgenic embryos. Multiple stable lines were made for each construct as described[32]. For time course studies, embryos from stable transgenic lines were collected from E14.5 through E16.5, and at specific post-natal time points up to 6 months. All mouse procedures in this and other sections were done in accordance with protocols approved by the Stanford University and Harvard University Institutional Animal Care and Use Committees.

X-gal Staining

Whole mount staining for B-galactosidase activity was performed as described[32]. Embryos were hemi-sected and fixed in 4% paraformaldehyde (PFA) (Sigma #158127) at 4 degrees Celsius according to gestational day guidelines. Fixed embryos were washed 3 times in wash buffer and stained for 16–24 hours in the dark with 1mg/ml X-gal (Sigma#B4252) in staining buffer at room temperature. For comparisons between ancestral and derived variants, embryos were processed following the exact same protocols. After staining, embryos were briefly rinsed in wash buffer and post-fixed in 4% PFA for 5 hours.

In situ Hybridization Experiments

Antisense and sense digoxigenin-labeled probes for in situ hybridization were generated for Gdf5, Col1a1, and Col2a1 as described[32,59]. To compare Gdf5 lacZ expression, whether driven by the UPSTREAM BAC, DOWNSTREAM BAC, or GROW1 regulatory sequence, to endogenous gene expression, E15 lacZ+ embryos were harvested, snap-frozen, and embedded in Tissue-Tek OCT compound. Embedded embryos were then serially sectioned on a cryostat. Adjacent sections (N= 3 per experiment) were either stained for lacZ expression using the X-gal staining methods described above, or used in standard in situ hybridization protocols as described[32].

BAC Rescue Experiments and Genotyping of Progeny

Independent mouse lines carrying either the UPSTREAM BAC or the DOWNSTREAM BAC were crossed to animals homozygous for the brachypodism allele (URL1, see below). Multiple different DOWNSTREAM BAC (N=5) and UPSTREAM BAC (N=3) founders were used in the above crosses to control for the integration site of each BAC in the genome. Brachypodism mutations result in frameshifts and premature translational termination in the Gdf5 open reading frame[16]. Animals that were bp/+;Gdf5-BAC+ were then crossed to non-transgenic bp/bp mice. The progeny were genotyped for the lacZ transgene and the brachypodism mutation in separate PCR reactions, using the primers listed in Supplementary Table 8. Note that PCR reactions for the Gdf5 locus amplify the endogenous Gdf5 rather than the BAC sequences specifically since the reverse primer is positioned downstream of the IRES-B-Geo insertion at the 3’UTR of the Gdf5 transgene making amplification inefficient for the BAC. The brachypodism mutation was identified by sequencing around the bp site in the resulting PCR product. Mice that were bp/bp;BAC+, bp/bp;BAC- or bp/+;BAC- were compared as described below.

CRISPR-Cas9 Gene Targeting of the GROW1 Enhancer

Dual sgRNAs surrounding the mouse GROW1 regulatory element were designed using MIT CRISPR Tools (URL2), synthesized by Integrated DNA Technologies, Inc, and cloned into the PX458 vector following protocol[41] (step 5-part B). The sequence of both sgRNAs are listed in Supplementary Table 8. Guide RNAs were initially tested for ability to induce efficient deletions of the mouse GROW1 enhancer in cultured murine NIH3T3 cells. NIH3T3 cells were maintained in DMEM (Gibco) supplied with 10% FCS (Gibco) and 1% Pen/Strep (0.025%), and seeded 0.3 ×106 cells per each well of a 6-well plate 1 day prior to transfection. On the day of transfection, cells were incubated for 30 minutes in 0.5 ml of fresh culture medium, to which we then added 0.5 ml of a solution containing 1 μg sgRNA1-PX458 and 1 μg sgRNA2-PX458 in 250 μl Opti-MEM (Invitrogen), and 6 μl Lipofectamine® 2000 (Thermo Fisher Scientific) with 250μl Opti-MEM (Invitrogen). After 2 days of culture at 37°C, we scanned cells under a GFP-microscope to verify successful transfection and GFP expression. DNA was then extracted using E.Z.N.A Tissue DNA Kit, and the GROW1 region was amplified using touch down PCR and primers surrounding the guide RNA design sites (External Primers F1 and R1, Supplementary Table 8). Amplification products were isolated from 1% agarose gels (E.Z.N.A Gel Extraction Kit), and sequenced (Eton Bioscience) to verify GROW1 deletions. After confirmation that sgRNAs worked effectively in vitro, we performed in vitro transcription of sgRNAs following[60] to generate guides useful for DNA microinjection. These sgRNAs along with transfection-ready Cas9 mRNA (CAS500A-1; Systems Biosciences, LLC) were provided to the Harvard Genome Modification Facility for microinjection in wildtype C57BL/6J pro-nuclei. After F0 founder mice were obtained (N=3), we extracted DNA from tail clippings and performed genotyping using the F1 and R1 PCR primers as above. The wildtype amplification product was 2373 bp, and a subset of founder mice showed a deletion band of ~550 bp. Positive F0 mice were bred to C57BL/6J to transmit the mutations. F1 progeny from a single stable line were then intercrossed to generate F2 progeny possessing either wildtype, heterozygous, or homozygous GROW1 genotypes, which were verified using PCR amplification with F1/R1, F1/R2, and F2/R1 primers (Supplementary Table 8). All genotypes were acquired at expected Mendelian ratios. Sequencing confirmed that the tested GROW1 deletion removes ~1830 noncoding base pairs, corresponding to chr2:155,726,835–155,725,006 (mm9).

Skeletal Preparation and Measurements

Adult mice (P30 and P56) were stained with Alcian blue and Alizarin red and cleared as described[61]. The skin and visceral organs were removed and the animals were fixed in 95% ethanol. The specimens were then stained in a mixture of Alcian blue, acetic acid, and ethanol. After several washes in 95% ethanol, the samples were cleared in KOH and then stained by Alizarin red in KOH solution. Skeletons were stored in glycerol and when necessary, photographed. For the brachypodism BAC rescue experiment, skeletal element length and cranial length were measured with a micro-caliper under a dissecting microscope. Ten specimens (e.g., femora) for each genotype (i.e., homozygous bp animals without the BAC, homozygous bp animals with the BAC, and heterozygous bp animals without the BAC) were measured. Numbers of animals were chosen to provide multiple independent samples of all key genotypes. No randomization of animals to different treatment groups was necessary given the nature of the experiments. No animals were excluded from analysis, and blinding was not part of the study design. See Statistical Methods section for analytical details. To detect phenotypic differences in GROW1 deletion mice, N=12 30-day old male specimens for each genotype were collected, prepared for skeletons as above, and sent to the Center for Skeletal Research Imaging and Biomechanical Testing Core at the Massachusetts General Hospital for scanning by μCT. The right femur and tibia of each animal were scanned using a high-resolution desktop micro-tomographic imaging system (μCT40, SCANCO Medical AG, Brüttisellen, Switzerland) to assess bone morphology. The bones were imaged using a 12 μm[3] isotropic voxel size, 70 kVp peak x-ray tube intensity, 114 mA x-ray tube current, and 200 ms integration time. Scans were reconstructed using SCANCO GPU Reconstruction software and then DICOM images were exported for bone morphology measurements using Osirix MD V.7.5 (Pixemo SARL, Bernex, Switzerland). Next, in a blind experimental design in which the measurer (AMK) did not have access to genotypes, μCT scans of each femur and tibia were measured for Femoral Neck Length, Femoral Length, Femoral Head Height, and Tibial Length. In a separate round of measurements, investigator TDC measured long bone lengths on the same specimens using a micro-caliper under a dissecting microscope. Measurements were normalized in two different ways, based either on 1) the long axis of the third caudal vertebrae, a bone of the axial skeleton that was easily dissected, not under the control of the GROW1 element, and whose measurements were not statistically different between genotypes; or (2) the length of the clavicle, a skeletal element that forms predominately via intramembranous ossification, is not under the control of the GROW1 element, and whose length estimates were both highly reproducible and did not statistically differ between genotypes. See Statistical Methods section for analytical details.

Allele-Specific Expression Analyses

Timed matings were established between C57BL/6J GROW1+/− heterozygous mice and 129SVJ GROW1+/+ wild type mice. Pregnant females were sacrificed following IACUC approved protocols to acquire E15.5 embryos. Distal femoral growth plates from both sides of an individual animal were pooled, homogenized using a tissue homogenizer (QIAGEN), and RNA was isolated using the Trizol Reagent (15596–026, Ambion by Life Technologies) and the RNA Clean & Concentrator™-5 Kit (supplied with DNase I, Zymo) (N=4 biological replicates). Samples were then run on a bioanalyzer to achieve RNA Integrity Numbers greater than 8. These RNA samples were then reverse transcribed using a SuperScript IV First Strand cDNA Synthesis Reaction kit (18090010, Life Technologies) following manufacturer’s recommendations. Independently, tails from each embryo were used for GROW1 genotyping as described above. cDNA samples were then sent to EpigenDx for allele-specific expression assay design and execution. SNPs in the coding regions of Gdf5 (rs27340038), Cep250 (rs27339949), and Uqcc1 (rs3684985) were identified by EpigenDx. Pyrosequencing for SNP genotyping (PSQ H96A, Qiagen Pyrosequencing) is a real-time sequencing-based DNA analysis that quantitatively determining the genotypes of a single or multiple mutations in a single reaction. Briefly, 1 ng of sample cDNA was used for PCR amplification. PCR was performed using 10X PCR buffer (Qiagen Inc.) at 3.0 mM MgCl2, 200M of each dNTP, 0.2 μM each of the forward and reverse primers (available through EpigenDx), and 0.75 U of HotStar DNA polymerase (Qiagen Inc.), per 30 ul reaction. PCR cycling conditions were: 94°C 15 min; 45 cycles of 94°C 30 s; 60°C 30 s; 72°C 30 s; 72°C 5 min. One of the PCR primer pairs was biotinylated to convert the PCR product to single-stranded DNA sequencing templates using streptavidin beads and the PyroMark Q96 Vacuum Workstation. 10 μl of the PCR products were bound to streptavidin beads and the single strand containing the biotinylated primer was isolated and combined with a specific sequencing primer (available through EpigenDx). The primed single stranded DNA was sequenced using the Pyrosequencing PSQ96 HS System (Qiagen Pyrosequencing) following the manufacturer’s instructions (Qiagen Pyrosequencing). The genotypes of each sample were analyzed using Q96 software AQ module (Qiagen Pyrosequencing). Pyrosequencing results for each SNP were used to calculate the allelic ratios of 129SVJ (wildtype allele) to C57BL/6J (GROW1 null allele) in the heterozygous state. Each heterozygous ratio of cDNA products was then normalized by the ratio of WT 129SVJ to C57BL/6J genomic products, amplified from known 1:1 mixtures of each sequence. See Statistical Methods for analytical details.

Cloning and Construct Preparation for In Vitro Assays

For in vitro transfection into Chon-002 cells (see below), primers containing NotI linker sequences were used to amplify 3X copy inserted ancestral human GROW1B and derived human GROW1B sequences (from each modified p5’NotI hsp68 lacZ plasmid studied in vivo, see above and Supplementary Table 8) and ligated into a NotI containing pGL4.23 firefly luciferase vector. Ligates for constructs containing inserted or non-inserted (“empty”) sequence were next transformed into DH10B cells, streaked on ampicillin plates, single colonies were then picked, PCR screened, sequenced, and finally purified using Endo-Free Maxi-Prep kits (Qiagen).

In Vitro Assay Using Chon-002 Cells

Chon-002 cells derived from human fetal female 18 week-old femoral growth plate chondrocytes were obtained from ATCC (Cat#CRL-2847), and cultured at 5% CO2 at 37C in ATCC complete growth media containing Dulbecco’s Modified Eagle’s Medium (DMEM), 10% fetal bovine serum (FBS), 50mg/ml penicillin-streptomycin (P/S), and 0.1mg/ml G-418 (Geneticin). Media was replaced every two days and the cells were subcultured every five days. We received certified passage 25 cells from ATCC and utilized passage 26–29 cells for transfection experiments without additional STR authentification or mycoplasma testing. Prior to transfection, cells were first cultured in DMEM with 10% FBS for 24hr in 24-well dishes at a seeding density of 2×105 cells/well. Next, in DMEM only, cells were transiently transfected with varying amounts of ancestral or derived firefly luciferase reporter vector (i.e., 100ng, 300ng, or 500ng) plus empty pGL4.23 luciferase vector (to a total concentration of 500ng of pGL4.23 luciferase vector per well) along with prl-CMV renilla luciferase vector for transfection efficiency (Promega; Cat#E226A) using Fugene6 transfection reagent at a Fugene6:DNA ratio of 3:1(Promega; Cat# E269A) following the manufacturer’s recommended protocols. Enhancer activity was measured 24 hours after transfection using the Dual-Luciferase Reporter Assay System (Promega,TM040) following manufacturers protocols on a Dual Injector GloMax Multi Luminometer System (Promega; Cat# BASE 9311–011 and DUAL INJECTOR 9301–062). See Statistical Methods for analytical details.

Computational Methods

SNP HGVS Accession Codes

rs143383:NC_000020.10:g.34025983A>G rs143384:NC_000020.10:g.34025756A>G rs4911178:NC_000020.10:g.33952620G>A rs6088813:NC_000020.10:g.33975181C>A rs725908:NC_000020.10:g.33968067T>C rs6060373:NC_000020.10:g.33914208A>G rs6088792:NC_000020.10:g.33909784C>T rs6060369:NC_000020.10:g.33907161T>C

Predicted Transcription Factor Binding Sites

To identify upstream transcription factors (TF) predicted to bind at the rs4911178 site, we used HaploRegV4.1[62] with default parameters (URL3). We also queried the UniProbe database (URL4), which is based on replicate experimental measurements of in vitro binding interactions between expressed TFs and all possible 8-mer target oligonucleotides[63-64]. For the UniProbe analysis, two 15 base-pair sequences differing only at rs4911178 (Ancestral (ANC) GTCACAGGGATTATA and Derived (DER) GTCACAGAGATTATA were analyzed using Score Threshold: 0.2 and Species: Homo sapiens. This produced 703 TF predictions for ANC, 390 TF predictions for DER (Supplementary Table 7). For a TF binding site to be considered reduced or gained, the site must exhibit an enrichment score change of greater than at least 0.10, which corresponded to significant changes in Zscore (see Supplementary Fig.7). Lists of potential upstream regulators exhibiting marked reduction/gain in binding affinity were then screened using expression and phenotypic data in order to identify those TFs also expressed in, or functionally required, in growth plates. To carry out this analysis we used data in VisiGene (URL5), Eurexpress (URL6), Genepaint (URL7), and the Mouse Genome Informatics expression and phenotypic databases (URL8). Two factors, PITX1 and HOXD13 met the above criteria. Each had enrichment scores above the recommended UniProbe threshold of 0.4 for the ANC sequence but dropped below 0.3 for the DER sequence, and this corresponded to a significant change in Zscore (Supplementary Fig.7). Pitx1 is expressed in hindlimb growth plates and is required for normal development of hindlimb long bones in humans and mice[43]. Hoxd13 is also expressed in growth plates[65] and is required for normal long bone development[66].

1000 Genomes Project Data Retrieval and SNP Frequency Calculations

We focused on a sequence window spanning 200 kb upstream and 200 kb downstream of the GDF5 locus (chr20:33,752,620 to chr20:34,152,620; hg19). Phased haplotype data were retrieved for SNPs across this region via the 1000 Genomes resource[33]. The raw data (VCF format) is available online (URL9). Twelve data files were retrieved: one dataset (Primary) included genotype data for all 8 non-admixed populations (CEU, GBR, TSI, CHS, CHB, JPT, YRI, LWK); eight datasets were divided by populations; three were divided by continents (European - EUR, Asian - ASI, African - AFR). Admixed populations (ASW, CLM, FIN, IBS, MXL, PUR) were removed. Related individuals were also removed from all 1000 Genomes phased haplotype data and any subsequent analyses. The Primary dataset (all 8 populations) consisted of 1486 genotypes (743 individuals X 2 genotypes per individual) and 4,066 SNPs, identified across this 400 kb window. This dataset was used to compute continent and population-specific SNP frequencies. 1,370 and 622 SNPs were present at overall frequencies >= 0.01 and 0.05, respectively. We next ran Haploview (see below) and detected a smaller haplobock at chr20: 33,887,955–34,025,983 (138,029 bases). Within this haploblock there were 1354 SNPs. Based on MAF, we acquired 174 SNPs (MAF > 0.05) and 1489 genotypes, and 395 SNPs (MAF > 0.01) and 1489 genotypes. After removing all rare recombinant, haplotypes and using MAF >0.05, we generated a Reduced Dataset of 1091 human haplotypes plus chimp, Neandertal and Denisovan (total 1094 haplotypes, 174 SNPs). This resulted in removal of putative recombinants confirmed through visual inspection, which can otherwise complicate phylogenetic analysis (see below).

Analysis of Protein Coding Substitutions in the Locus

Analysis of protein coding substitutions involved using data collected from several existing databases (UCSC browser coding SNPs, and dbSNP) as well de novo detection of polymorphisms from 1000 Genome databases (Pilot 1000 Genome variant calls, URL9). Given the presence of a high number of exceptionally rare (MAF < 0.0001) to rare (MAF <.01) variants from both de novo and existing databases, functional predictions were made only for those polymorphisms in which MAF > 0.05 for each continent, global MAF > 0.01, and those which were not disqualified due to ascertainment and variant call issues[33]. Each coding substitution was then functionally analyzed using Poly-Phen2[67] and SIFT[68] databases. See Supplementary Table 1 (sheet 3).

Linkage Disequilibrium (LD) and Haploblock Detection

Separately for each individual population and continent, we computed LD between all pairwise SNPs using Haploview 4.2[69] with default parameters (HW p-value cutoff = 0.001, Min genotype %: 75, Max # Mendel errors: 1), and MAF >= 0.05 (as this parameter was also used in subsequent haploblock detection calculations). Haploblocks were then calculated using the default algorithm[70], which detects blocks with 95% of informative comparisons in “strong LD”. A large haploblock containing SNP rs4911178 was detected consistently across European and Asian populations. This also represented the largest observed haploblock across the 400 kb interval. This haploblock was then extended to the maximal borders detected in any population, which occurred in the JPT population and encompassed the region: chr20:33,887,955–34,025,983. These results were nearly identical to those identified on much larger genomic regions using both UCSC hg18 genome browser LD annotations, and HGDP online tools. Additional LD scores reported in this manuscript were also computed with Haploview using the above parameters unless otherwise noted.

Ancient Sequences

High coverage Neandertal[39] and Denisovan[40] sequence data were acquired (URL10 and URL11). Additional low coverage Neandertal sequence data were acquired from the Neandertal Sequencing Project (URL12) and utilized ancient DNA acquired from six distinct individuals[38]. Three individuals were acquired from Vindija cave in Croatia (Vi33.16, ~ 54.1% genome coverage; Vi33.25, ~ 46.6% genome coverage; Vi33.26, ~ 45.2% genome coverage), one was from the Neandertal type specimen from the Neander Valley in Germany (Feld1, ~ 0.1% genome coverage), one was from El Sidron cave in Asturias, Spain (Sid1253, ~ 0.1% genome coverage) and one was from Mezmaiskaya in the Altai Mountains, Russia (Mez1, ~ 2% genome coverage)[38]. All sequence for the GDF5 interval from these six individuals as well as the Altai Neandertal and Denisovan were extracted specifically using SAMtools with a region specifier (URL13). Extracted sequences were re-aligned to hg19 and consensus sequence for each hominin for the region was built. Missing sequence or gaps (due to human-specific insertions) were designated with an “N” in each archaic hominin consensus.

Haplotype Phylogeny and Visualization

A multiple alignment of 1094 individuals (Reduced Dataset) was constructed for SNPs with MAF >=0.05 across the haploblock region detected above (chr20:33,887,955–34,025,983) for each individual in dataset. Corresponding chimp (panTro3), Neandertal and Denisovan alleles were also included. Positions containing a gap (−) or missing information (N) in archaic hominin sequences (and a small number of cases in chimp) were removed. This resulted in an alignment of 174 SNPs with complete allele information for 1091 human genotypes, 1 Neandertal, 1 Denisovan, and 1 chimp sequence (Reduced Dataset). A maximum-likelihood (ML) tree was then constructed using RAxML with the GTR+WAG model of evolution. The chimp sequence was defined as an outgroup. RAxML rapid bootstrapping was performed to provide a measure of clade support. Visual genotypes were generated by classifying SNP values as derived or ancestral based on pre-calculated ancestral alleles from Ensembl compara (originally calculated using a 6-way primate alignment). Visual genotypes were then mapped onto the phylogenetic trees in order to trace patterns of SNP evolution.

Statistical Methods

All computational statistics and visualizations were performed using SPSS Statistics 17.0, Microsoft Excel, or R, following verification of normality and approximately equivalent levels of variance between sample groups. For BAC rescue and GROW1 deletion experiments, the indicated p-values are based on unpaired, two-tailed t-tests. Box and whisker plots for each experiment (Fig. 2 and Fig. 7; Supplementary Fig.4) show the upper and lower quartiles spanning the interquartile range (edges of the box) with the median being marked by a horizontal line within each box, and the whiskers showing the maximum and minimum values from each experiment. For cell culture experiments, to compare expression between the ancestral GROW1B construct to either the derived GROW1B construct (Fig. 4) or the empty vector construct (Supplementary Fig.7), we performed at least 5 independent transfection experiments containing 8 technical replicates (i.e., individual wells of a 24 well dish) per construct per concentration, and did so at 4 distinct reporter concentrations (i.e. 100ng, 200ng, 300ng, and 500ng). For each concentration, we compared the mean expression of the ancestral construct to that of the mean expression of the empty vector construct or derived construct using a two sample directional Student’s T-test. The p-values from independent experiments were combined across either five (for the empty vector construct comparison) or seven (for the derived construct comparison) experiments (at each concentration) using Fisher’s combined probability test. Box and whisker plots (Fig. 4 and Supplementary Fig.7) show the upper and lower quartiles spanning the interquartile range (edges of the box) with the median being marked by a horizontal line within each box, and the whiskers showing the maximum and minimum values. For the Allele-Specific Expression analysis, the Permutation Test, a non-parametric measure, was used to determine significance between the WT and heterozygous allelic expression ratio using the {perm} module in R.

Referenced URLs

URL1:http://www.jax.org URL2:http://crispr.mit.edu URL3:http://archive.broadinstitute.org/mammals/haploreg/haploreg.php URL4:http://the_brain.bwh.harvard.edu/uniprobe/about.php URL5:http://genome.ucsc.edu/cgi-bin/hgVisigene URL6:http://www.eurexpress.org/ee/ URL7:http://www.genepaint.org/Frameset.html URL8:http://www.informatics.jax.org URL9:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr20.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz URL10:http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal/VCF/ URL11:http://cdna.eva.mpg.de/denisova/VCF/ URL12:http://genome.ucsc.edu/Neandertal/ URL13:http://samtools.sourceforge.net/
  68 in total

1.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

2.  Many sequence variants affecting diversity of adult human height.

Authors:  Daniel F Gudbjartsson; G Bragi Walters; Gudmar Thorleifsson; Hreinn Stefansson; Bjarni V Halldorsson; Pasha Zusmanovich; Patrick Sulem; Steinunn Thorlacius; Arnaldur Gylfason; Stacy Steinberg; Anna Helgadottir; Andres Ingason; Valgerdur Steinthorsdottir; Elinborg J Olafsdottir; Gudridur H Olafsdottir; Thorvaldur Jonsson; Knut Borch-Johnsen; Torben Hansen; Gitte Andersen; Torben Jorgensen; Oluf Pedersen; Katja K Aben; J Alfred Witjes; Dorine W Swinkels; Martin den Heijer; Barbara Franke; Andre L M Verbeek; Diane M Becker; Lisa R Yanek; Lewis C Becker; Laufey Tryggvadottir; Thorunn Rafnar; Jeffrey Gulcher; Lambertus A Kiemeney; Augustine Kong; Unnur Thorsteinsdottir; Kari Stefansson
Journal:  Nat Genet       Date:  2008-04-06       Impact factor: 38.330

3.  Differential staining of cartilage and bone in whole mouse fetuses by alcian blue and alizarin red S.

Authors:  M J McLeod
Journal:  Teratology       Date:  1980-12

4.  Mutations in CDMP1 cause autosomal dominant brachydactyly type C.

Authors:  A Polinkovsky; N H Robin; J T Thomas; M Irons; A Lynn; F R Goodman; W Reardon; S G Kant; H G Brunner; I van der Burgt; D Chitayat; J McGaughran; D Donnai; F P Luyten; M L Warman
Journal:  Nat Genet       Date:  1997-09       Impact factor: 38.330

5.  Genetic variation in the GDF5 region is associated with osteoarthritis, height, hip axis length and fracture risk: the Rotterdam study.

Authors:  R B A Vaes; F Rivadeneira; J M Kerkhof; A Hofman; H A P Pols; A G Uitterlinden; J B J van Meurs
Journal:  Ann Rheum Dis       Date:  2008-11-24       Impact factor: 19.103

6.  Common variants in the GDF5-UQCC region are associated with variation in human height.

Authors:  Serena Sanna; Anne U Jackson; Ramaiah Nagaraja; Cristen J Willer; Wei-Min Chen; Lori L Bonnycastle; Haiqing Shen; Nicholas Timpson; Guillaume Lettre; Gianluca Usala; Peter S Chines; Heather M Stringham; Laura J Scott; Mariano Dei; Sandra Lai; Giuseppe Albai; Laura Crisponi; Silvia Naitza; Kimberly F Doheny; Elizabeth W Pugh; Yoav Ben-Shlomo; Shah Ebrahim; Debbie A Lawlor; Richard N Bergman; Richard M Watanabe; Manuela Uda; Jaakko Tuomilehto; Josef Coresh; Joel N Hirschhorn; Alan R Shuldiner; David Schlessinger; Francis S Collins; George Davey Smith; Eric Boerwinkle; Antonio Cao; Michael Boehnke; Gonçalo R Abecasis; Karen L Mohlke
Journal:  Nat Genet       Date:  2008-01-13       Impact factor: 38.330

7.  The complete genome sequence of a Neanderthal from the Altai Mountains.

Authors:  Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal:  Nature       Date:  2013-12-18       Impact factor: 49.962

8.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

9.  Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size.

Authors:  Nicole Soranzo; Fernando Rivadeneira; Usha Chinappen-Horsley; Ida Malkina; J Brent Richards; Naomi Hammond; Lisette Stolk; Alexandra Nica; Michael Inouye; Albert Hofman; Jonathan Stephens; Eleanor Wheeler; Pascal Arp; Rhian Gwilliam; P Mila Jhamai; Simon Potter; Amy Chaney; Mohammed J R Ghori; Radhi Ravindrarajah; Sergey Ermakov; Karol Estrada; Huibert A P Pols; Frances M Williams; Wendy L McArdle; Joyce B van Meurs; Ruth J F Loos; Emmanouil T Dermitzakis; Kourosh R Ahmadi; Deborah J Hart; Willem H Ouwehand; Nicholas J Wareham; Inês Barroso; Manjinder S Sandhu; David P Strachan; Gregory Livshits; Timothy D Spector; André G Uitterlinden; Panos Deloukas
Journal:  PLoS Genet       Date:  2009-04-03       Impact factor: 5.917

10.  UniPROBE: an online database of protein binding microarray data on protein-DNA interactions.

Authors:  Daniel E Newburger; Martha L Bulyk
Journal:  Nucleic Acids Res       Date:  2008-10-08       Impact factor: 16.971

View more
  33 in total

Review 1.  Complex Phenotypes: Mechanisms Underlying Variation in Human Stature.

Authors:  Pushpanathan Muthuirulan; Terence D Capellini
Journal:  Curr Osteoporos Rep       Date:  2019-10       Impact factor: 5.096

Review 2.  A century of development.

Authors:  Joan T Richtsmeier
Journal:  Am J Phys Anthropol       Date:  2018-04       Impact factor: 2.868

3.  Functional Deficits in Mice Expressing Human Interleukin 8.

Authors:  Julie Michelle Brent; Zuozhen Tian; Lutian Yao; Jian Huang; Dessislava Z Markova; Frances S Shofer; Angela K Brice; Ling Qin; Carla R Scanzello; Flavia Vitale; Di Chen; Yejia Zhang
Journal:  Comp Med       Date:  2020-04-20       Impact factor: 0.982

Review 4.  Genetics of scapula and pelvis development: An evolutionary perspective.

Authors:  Mariel Young; Licia Selleri; Terence D Capellini
Journal:  Curr Top Dev Biol       Date:  2019-01-07       Impact factor: 4.897

5.  Molecular signatures of selection on the human GLI3 associated central nervous system specific enhancers.

Authors:  Irfan Hussain; Rabail Zehra Raza; Shahid Ali; Muhammad Abrar; Amir Ali Abbasi
Journal:  Dev Genes Evol       Date:  2021-03-02       Impact factor: 0.900

6.  Evolutionary Selection and Constraint on Human Knee Chondrocyte Regulation Impacts Osteoarthritis Risk.

Authors:  Daniel Richard; Zun Liu; Jiaxue Cao; Ata M Kiapour; Jessica Willen; Siddharth Yarlagadda; Evelyn Jagoda; Vijaya B Kolachalama; Jakob T Sieker; Gary H Chang; Pushpanathan Muthuirulan; Mariel Young; Anand Masson; Johannes Konrad; Shayan Hosseinzadeh; David E Maridas; Vicki Rosen; Roman Krawetz; Neil Roach; Terence D Capellini
Journal:  Cell       Date:  2020-03-26       Impact factor: 41.582

Review 7.  Osteoarthritis year in review 2017: genetics and epigenetics.

Authors:  M J Peffers; P Balaskas; A Smagul
Journal:  Osteoarthritis Cartilage       Date:  2017-10-06       Impact factor: 6.576

Review 8.  The Myth of Optimality in Clinical Neuroscience.

Authors:  Avram J Holmes; Lauren M Patrick
Journal:  Trends Cogn Sci       Date:  2018-02-20       Impact factor: 20.229

9.  The effect of common variants in GDF5 gene on the susceptibility to chronic postsurgical pain.

Authors:  Shaoyao Yan; Huiyong Nie; Gang Bu; Weili Yuan; Suoliang Wang
Journal:  J Orthop Surg Res       Date:  2021-07-01       Impact factor: 2.359

10.  Joint disease-specificity at the regulatory base-pair level.

Authors:  Pushpanathan Muthuirulan; Dewei Zhao; Mariel Young; Daniel Richard; Zun Liu; Alireza Emami; Gabriela Portilla; Shayan Hosseinzadeh; Jiaxue Cao; David Maridas; Mary Sedlak; Danilo Menghini; Liangliang Cheng; Lu Li; Xinjia Ding; Yan Ding; Vicki Rosen; Ata M Kiapour; Terence D Capellini
Journal:  Nat Commun       Date:  2021-07-06       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.