| Literature DB >> 34411106 |
Chenxing Liu1, Myoung Keun Lee2, Sahin Naqvi3,4, Hanne Hoskens5,6, Dongjing Liu1, Julie D White7, Karlijne Indencleef5,8, Harold Matthews5,6,9, Ryan J Eller10, Jiarui Li5,8, Jaaved Mohammed3,11, Tomek Swigut3,11, Stephen Richmond12, Mange Manyama13, Benedikt Hallgrímsson14, Richard A Spritz15, Eleanor Feingold1, Mary L Marazita1,2, Joanna Wysocka3,11,16, Susan Walsh10, Mark D Shriver7, Peter Claes5,6,8,9, Seth M Weinberg1,2,17, John R Shaffer1,2.
Abstract
Facial morphology is highly variable, both within and among human populations, and a sizable portion of this variation is attributable to genetics. Previous genome scans have revealed more than 100 genetic loci associated with different aspects of normal-range facial variation. Most of these loci have been detected in Europeans, with few studies focusing on other ancestral groups. Consequently, the degree to which facial traits share a common genetic basis across diverse sets of humans remains largely unknown. We therefore investigated the genetic basis of facial morphology in an East African cohort. We applied an open-ended data-driven phenotyping approach to a sample of 2,595 3D facial images collected on Tanzanian children. This approach segments the face into hierarchically arranged, multivariate features that capture the shape variation after adjusting for age, sex, height, weight, facial size and population stratification. Genome scans of these multivariate shape phenotypes revealed significant (p < 2.5 × 10-8) signals at 20 loci, which were enriched for active chromatin elements in human cranial neural crest cells and embryonic craniofacial tissue, consistent with an early developmental origin of the facial variation. Two of these associations were in highly conserved regions showing craniofacial-specific enhancer activity during embryological development (5q31.1 and 12q21.31). Six of the 20 loci surpassed a stricter threshold accounting for multiple phenotypes with study-wide significance (p < 6.25 × 10-10). Cross-population comparisons indicated 10 association signals were shared with Europeans (seven sharing the same associated SNP), and facilitated fine-mapping of causal variants at previously reported loci. Taken together, these results may point to both shared and population-specific components to the genetic architecture of facial variation.Entities:
Mesh:
Year: 2021 PMID: 34411106 PMCID: PMC8375984 DOI: 10.1371/journal.pgen.1009695
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Facial segmentation and GWAS results.
(a) Rosette showing the global-to-local partitioning of the full face into segments. The full face (segment 1, red) is first partitioned into segments representing the outer (2, orange) and inner (3, cyan) regions of the face. These are in turn partitioned into more localized regions representing the lower face (magenta), upper face (salmon), nose (blue), and mouth and eyes (green). (b) Combined Manhattan plot highlighting the genome-wide significant genetic variants across 63 facial segments. Significantly associated variants are colored to correspond to the facial segments as shown in (a). The blue dotted line and red solid line indicate the genome-wide (P < 2.5 × 10−8) and study-side (P < 6.25 × 10−10) significance thresholds, respectively.
Summary of 20 GWAS signals in Tanzania.
| Replication p-values in Euro sample | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Locus | Lead SNP | Chr | Position | A1 | A2 | MAF | MAF (Euro) | Candidate genes | Best P | Best mod (num mods) | T1 | T2 | T3 | T4 | T5 |
| 1q22 | rs58409393 | 1 | 155025307 | G | A | 0.05 | 0 | ADAM15 | 1.6E-8 | 41 (1) | NA | NA |
| NA | 8.8E-05 |
| 3p14.3 | rs56063440 | 3 | 54731374 | C | G | 0.37 | 0.28 | ESRG | 9.7E-9 | 52 (3) |
|
|
|
| 1.5E-05 |
| 3q21.3 | chr3:127963189 | 3 | 127963189 | T | TGC | 0.34 | NA | EEFSEC | 1.5E-11 | 27 (3) |
|
|
|
|
|
| 3q28 | rs112643361 | 3 | 188438871 | G | A | 0.10 | 0 | NA | 1.8E-8 | 21 (1) | NA | NA | 3.7E-07 | NA | 7.9E-04 |
| 4p15.2 | chr4:24163580 | 4 | 24163580 | G | GAT | 0.18 | NA | NA | 8.9E-9 | 53 (1) | NA | NA | 5.2E-06 | NA | 2.3E-03 |
| 4q31.3 | rs9995821 | 4 | 154828366 | C | T | 0.19 | 0.22 | DCHS2 | 2.5E-22 | 27 (8) |
|
|
|
|
|
| 5q14.3 | rs11959408 | 5 | 89964298 | T | C | 0.28 | 0.34 | GRP98 | 1.1E-8 | 43 (1) | 0.92 | 0.002 | 5.5E-06 | 0.013 | 2.6E-04 |
| 5q31.1 | rs113199279 | 5 | 134806314 | T | G | 0.11 | 0 | CXCL14 | 2.1E-8 | 28 (1) | NA | NA | 7.9E-07 | NA | 9.5E-06 |
| 7q22.1 | rs114777090 | 7 | 102901689 | G | A | 0.14 | 0 | NA | 8.2E-9 | 18 (1) | 0.49 | 0.004 | 3.3E-06 | 0.26 | 1.6E-04 |
| 9p21.3 | rs10122939 | 9 | 20300843 | G | A | 0.28 | 0.004 | MLLT3, FOCAD | 3.3E-10 | 48 (5) | 0.02 | 0.02 | 4.8E-05 | 0.21 | 2.5E-04 |
| 9q21.33 | rs188502472 | 9 | 86936444 | T | C | 0.03 | 0.001 | NA | 2E-9 | 3 (1) | NA | NA | 5.0E-06 | NA | 1.0E-03 |
| 10p15.3 | chr10:1582881 | 10 | 1582881 | AC | A | 0.06 | NA | NA | 2.7E-9 | 4 (1) | NA | NA | 6.6E-07 | NA | 9.6E-05 |
| 10q26.11 | rs242980 | 10 | 119281243 | A | G | 0.34 | 0.17 | EMX2 | 1.5E-11 | 1 (2) |
|
|
|
|
|
| 12q14.3 | rs10878346 | 12 | 66320873 | A | G | 0.49 | 0.25 | HMGA2 | 5.5E-12 | 1 (4) |
|
|
|
|
|
| 12q21.31 | rs74112009 | 12 | 85808404 | A | T | 0.46 | 0.06 | ALX1 | 1.8E-15 | 30 (6) |
|
|
|
|
|
| 12q24.21 | rs80243479 | 12 | 115356683 | C | T | 0.04 | 0 | TBX3 | 2.1E-8 | 14 (1) | NA | NA |
| NA | 9.4E-04 |
| 13q13.3 | rs9603276 | 13 | 38481292 | G | A | 0.05 | 0.4 | LINC00571 | 1.5E-9 | 11 (1) | 0.99 | 0.0002 | 3.1E-05 | 0.1 | 8.8E-04 |
| 13q32.3 | rs148390647 | 13 | 100542948 | G | C | 0.01 | 0 | ZIC5 | 1.4E-8 | 59 (1) | NA | NA |
| NA | 5.4E-05 |
| 18q22.1 | rs77926594 | 18 | 63466440 | A | G | 0.02 | 0 | NA | 1.6E-8 | 40 (1) | NA | NA | 4.4E-06 | NA | 9.8E-04 |
| 20p11.22 | rs16983329 | 20 | 22035197 | A | G | 0.28 | 0.03 | FOXA2 | 1.5E-8 | 54 (2) |
|
|
|
|
|
aThe chromosome coordinates are based on human genome build 19
bMAF of European population were based on 1000 Genome Phase 3 data downloaded from dbSNP; "NA" if there is no frequency data in dbSNP
cWe determine plausible genes based on functional evidence from previous publication and colocalization result with eQTL
dBest mod: The facial segment where the lowest p-value was found (num mods: the number of facial segments where a genome-wide significance was identified)
eAssociation result from the SNP-level testing of the projection of the Tanzania dataset onto the European derived phenotype (T1 in method section)
fAssociation result from the SNP-level look-up of the “best segment” (T2)
gAssociation result from the locus-level look-up of the “best segment” (T3)
hAssociation result from the SNP-level look-up for a qualitatively similar facial segment (T4)
iAssociation result from the locus-level look-up for a qualitatively similar facial segment (T5)
jPlease find the full result of T1-T5 in S3 Table
kThe lead SNPs are not available in European samples
P value highlighted with bold format indicates replication association after Bonferroni correction
Fig 2Colocalization plots between Tanzania GWAS and eQTL sites of the EEFSEC gene in "Skin—sun exposed" tissue.
Note, eQTL results from one representative tissue are shown; similar eQTL signals were observed across multiple tissues and/or cells. The top right plot (b) shows the association results in the Tanzania GWAS; the bottom right plot (c) represents the eQTL results; the left plot (a) shows the colocalization of genetic association and eQTL signals. The SNP indicated by the purple diamond is the SNP for which the African LD information is shown.
Fig 3The 20 GWAS loci are enriched for enhancers preferentially active in cranial neural crest cells and embryonic craniofacial tissue.
Boxplots indicate H3K27ac signal (log-transformed coverage) in the vicinity of the 20 GWAS loci (within 20kb) in individual samples; cranial neural crest cells and embryonic craniofacial samples are colored blue and orange, respectively. The dashed line at ~2.5 is the median signal across all cell types and tissues.
Fig 4LocusCompare visualizations of colocalization between Tanzania GWAS and European GWAS at (a-c) 3q21.3 and (d-f) 12q21.31. The top right plots (b and e) show the association results in the Tanzania GWAS; the bottom right plots (c and f) represent the corresponding results in the European GWAS; the left plots (a and d) are visualizations of colocalization. For each locus, the SNP indicated by the purple diamond is the SNP for which the LD information is shown, with African LD structure indicated in the colocalization plot. The vertical gray dashed lines indicate the p-values of SNPs from the Tanzania GWAS that were unavailable in the European GWAS; the horizontal gray dashed lines indicate the p-values of SNPs from the European GWAS that were unavailable in in the Tanzanian GWAS.