| Literature DB >> 28500464 |
Susan Walsh1, Lakshmi Chaitanya2, Krystal Breslin3, Charanya Muralidharan3, Agnieszka Bronikowska4, Ewelina Pospiech5,6, Julia Koller2, Leda Kovatsi7, Andreas Wollstein8, Wojciech Branicki6,9, Fan Liu2,10,11, Manfred Kayser12.
Abstract
Human skin colour is highly heritable and externally visible with relevance in medical, forensic, and anthropological genetics. Although eye and hair colour can already be predicted with high accuracies from small sets of carefully selected DNA markers, knowledge about the genetic predictability of skin colour is limited. Here, we investigate the skin colour predictive value of 77 single-nucleotide polymorphisms (SNPs) from 37 genetic loci previously associated with human pigmentation using 2025 individuals from 31 global populations. We identified a minimal set of 36 highly informative skin colour predictive SNPs and developed a statistical prediction model capable of skin colour prediction on a global scale. Average cross-validated prediction accuracies expressed as area under the receiver-operating characteristic curve (AUC) ± standard deviation were 0.97 ± 0.02 for Light, 0.83 ± 0.11 for Dark, and 0.96 ± 0.03 for Dark-Black. When using a 5-category, this resulted in 0.74 ± 0.05 for Very Pale, 0.72 ± 0.03 for Pale, 0.73 ± 0.03 for Intermediate, 0.87±0.1 for Dark, and 0.97 ± 0.03 for Dark-Black. A comparative analysis in 194 independent samples from 17 populations demonstrated that our model outperformed a previously proposed 10-SNP-classifier approach with AUCs rising from 0.79 to 0.82 for White, comparable at the intermediate level of 0.63 and 0.62, respectively, and a large increase from 0.64 to 0.92 for Black. Overall, this study demonstrates that the chosen DNA markers and prediction model, particularly the 5-category level; allow skin colour predictions within and between continental regions for the first time, which will serve as a valuable resource for future applications in forensic and anthropologic genetics.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28500464 PMCID: PMC5487854 DOI: 10.1007/s00439-017-1808-5
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
DNA variant information for 77 SNPs previously associated with human pigmentation variation including their location, citations, as well as skin colour association and prediction ranking details obtained from the present study
| SNP | Chromosome | Gene | Alleles | BP (GRCh38) | Reference pigmentation association | Skin colour correlation [ | Ranking in final model | Coefficients (fitted glm) |
| |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | rs6679651 | 1 | HIST2H2BF | C/T | 149,757,453 | ns | ||||
| 2 | rs12233134 | 2 | EFR3B | C/T | 25,106,146 | Quillen et al. ( | ns | |||
| 3 | rs40132 | 5 | SLC45A2 | A/G | 33,950,597 | Nan et al. ( | ns | |||
| 4 | rs16891982 | 5 | SLC45A2 | C/G | 33,951,587 | Liu et al. ( | 0.142 (8.13e-58) | 5 | 0.27912209 | 1.72E-08 |
| 5 | rs2287949 | 5 | SLC45A2 | C/T | 33,954,405 | Stokowski et al. ( | 0.006 (0.004) | |||
| 6 | rs28777 | 5 | SLC45A2 | G/T | 33,958,853 | Branicki et al. ( | 0.097 (3.14E-40) | 24 | 8.65E-02 | 7.57E-02 |
| 7 | rs26722 | 5 | SLC45A2 | A/G | 33,963,764 | Han et al. ( | ns | |||
| 8 | rs6867641 | 5 | SLC45A2 | C/T | 33,985,751 | Graf et al. ( | ns | |||
| 9 | rs13289 | 5 | SLC45A2 | C/G | 33,986,303 | Graf et al. ( | 0.0114 (5.8E-05) | |||
| 10 | rs1936208 | 6 | Intergenic between ATP5F1P6 and LOC100129554 | C/T | 139,644,247 | ns | ||||
| 11 | rs12203592 | 6 | IRF4 | C/T | 396,320 | Branicki et al. ( | 0.0201 (5.18e-09) | 2 | −0.17565966 | 1.97E-12 |
| 12 | rs4959270 | 6 | LOC105374875 | A/C | 457,747 | Branicki et al. ( | ns | |||
| 13 | rs477823 | 7 | <NA> | G/T | 63,287,722 | 0.0068 (0.001) | ||||
| 14 | rs1385229 | 8 | C8orf37-AS1 | A/G | 95,759,318 | ns | ||||
| 15 | rs10756819 | 9 | BNC2 | A/G | 16,858,085 | Liu et al. ( | 0.021 (2.48E-09) | 36 | 1.32E-03 | 9.46E-01 |
| 16 | rs683 | 9 | TYRP1 | A/C | 12,709,304 | Branicki et al. ( | 0.0096 (4.6E-05) | 32 | 1.70E-02 | 3.83E-01 |
| 17 | rs376397 | 10 | GATA3 | A/G | 8,061,334 | ns | ||||
| 18 | rs10443915 | 10 | PRKG1 | A/T | 52,060,818 | ns | ||||
| 19 | rs12765852 | 10 | PRKG1 | C/T | 52,061,566 | ns | ||||
| 20 | rs10831496 | 11 | GRM5 | A/G | 88,824,822 | Nan et al. ( | ns | |||
| 21 | rs4936890 | 11 | Intergenic between OR10G7 and OR10D5P | A/G | 124,044,034 | 0.0113 (1.5E-05) | ||||
| 22 | rs35264875 | 11 | TPCN2 | A/T | 69,078,930 | Jacobs et al. ( | 0.0034 (0.016) | |||
| 23 | rs1042602 | 11 | TYR | A/C | 89,178,527 | Branicki et al. ( | 0.0025 (0.04) | 12 | −0.06223707 | 3.52E-03 |
| 24 | rs1393350 | 11 | TYR | A/G | 89,277,877 | Han et al. ( | 0.0109 (1.8E-05) | 21 | −5.60E-02 | 5.96E-02 |
| 25 | rs1126809 | 11 | TYR | A/G | 89,284,793 | Branicki et al. ( | 0.015 (2.2E-06) | 19 | −0.08357710 | 2.28E-02 |
| 26 | rs642742 | 12 | KITLG | A/G | 88,905,968 | Jonnalagadda et al. ( | 0.0533 (5.2E-21) | |||
| 27 | rs12821256 | 12 | KITLG | C/T | 88,934,557 | Branicki et al. ( | 0.0024 (0.046) | 33 | −1.52E-02 | 6.53E-01 |
| 28 | rs3782974 | 13 | DCT | A/T | 94,440,641 | Lao et al. ( | 0.0095 (6.6E-05) | |||
| 29 | rs2050537 | 13 | HS6ST3 | C/T | 96,608,646 | ns | ||||
| 30 | rs4983161 | 14 | <NA> | A/T | 19,726,716 | 0.007 (0.001) | ||||
| 31 | rs12896399 | 14 | LOC105370627 (upstream of SLC24A4) | G/T | 92,307,318 | Han et al. ( | 0.011 (1.8E-05) | 29 | -2.55E-02 | 2.08E-01 |
| 32 | rs2402130 | 14 | SLC24A4 | A/G | 92,334,858 | Branicki et al. ( | 0.027 (6.8E-12) | 27 | 3.98E-02 | 1.09E-01 |
| 33 | rs17128291 | 14 | SLC24A4 | A/G | 92,416,481 | Liu et al. ( | 0.0147 (7.28E-07) | 28 | −3.91E-02 | 1.30E-01 |
| 34 | rs12914268 | 15 | <NA> | A/G | 22,150,292 | ns | ||||
| 35 | rs1129038 | 15 | HERC2 | A/G | 28,111,712 | Liu et al. ( | 0.092 (1.77E-37) | 17 | 0.10536412 | 8.38E-03 |
| 36 | rs12913832 | 15 | HERC2 | A/G | 28,120,471 | Branicki et al. ( | 0.091 (9.9E-37) | 20 | 8.12E-02 | 3.45E-02 |
| 37 | rs2238289 | 15 | HERC2 | C/T | 28,208,068 | Mengel-From et al. ( | 0.033 (5.24E-14) | 15 | −0.11378297 | 8.00E-03 |
| 38 | rs8182028 | 15 | HERC2 | C/T | 28,222,788 | Liu et al. ( | ns | |||
| 39 | rs3940272 | 15 | HERC2 | A/C | 28,223,576 | Eiberg et al. ( | ns | |||
| 40 | rs6497292 | 15 | HERC2 | A/G | 28,251,048 | Kayser et al. ( | 0.075 (2.29E-30) | 30 | 5.79E-02 | 2.27E-01 |
| 41 | rs16950941 | 15 | HERC2 | A/G | 28,257,597 | Liu et al. ( | ns | |||
| 42 | rs1667394 | 15 | HERC2 | A/G | 28,285,035 | Duffy et al. ( | 0.052 (1.15E-21) | 6 | 0.16017374 | 4.70E-08 |
| 43 | rs1473917 | 15 | LOC101927079 | C/T | 22,067,210 | ns | ||||
| 44 | rs1545397 | 15 | OCA2 | A/T | 27,942,625 | Edwards et al. ( | 0.0166 (2.27E-07) | 34 | −1.03E-02 | 7.51E-01 |
| 45 | rs1800414 | 15 | OCA2 | A/G | 27,951,890 | Donnelly et al. ( | 0.047 (2.79E-19) | 4 | −0.53990294 | 6.12E-11 |
| 46 | rs1800407 | 15 | OCA2 | A/G | 27,985,171 | Branicki et al. ( | 0.007 (4.4E-04) | 8 | −0.19827349 | 1.20E-06 |
| 47 | rs1800401 | 15 | OCA2 | C/T | 28,014,906 | Branicki et al. ( | 0.0054 (0.005) | |||
| 48 | rs12441727 | 15 | OCA2 | A/G | 28,026,628 | Liu et al. ( | 0.0047 (0.005) | 25 | 6.03E-02 | 8.23E-02 |
| 49 | rs1448485 | 15 | OCA2 | A/C | 28,037,594 | Duffy et al. ( | ns | |||
| 50 | rs16950821 | 15 | OCA2 | A/G | 28,038,360 | Branicki et al. ( | 0.037 (3.6E-15) | |||
| 51 | rs1470608 | 15 | OCA2 | A/C | 28,042,974 | Branicki et al. ( | 0.063 (1.04E-25) | 31 | −3.79E-02 | 2.66E-01 |
| 52 | rs7495174 | 15 | OCA2 | A/G | 28,099,091 | Branicki et al. ( | ns | |||
| 53 | rs1426654 | 15 | SLC24A5 | A/G | 48,134,286 | Lamason et al. ( | 0.15 (1.19E-59) | 1 | 0.52412661 | 1.92E-23 |
| 54 | rs11076649 | 16 | AFG3L1P | C/G | 89,992,927 | 0.0058 (0.002) | ||||
| 55 | rs3114908 | 16 | ANKRD11 | A/G | 89,317,316 | Law et al. ( | 0.0201 (9.8E-09) | 35 | 3.93E-03 | 8.56E-01 |
| 56 | rs8049897 | 16 | DEF8 | A/G | 89,957,793 | Han et al. ( | 0.022 (1.5E-09) | |||
| 57 | rs8051733 | 16 | DEF8 | A/G | 89,957,797 | Law et al. ( | 0.029 (2.7E-12) | 16 | −0.06364481 | 8.16E-03 |
| 58 | rs164741 | 16 | DPEP1 | C/T | 89,625,889 | Han et al. ( | 0.015 (2.76E-07) | |||
| 59 | rs2239359 | 16 | FANCA | C/T | 89,783,071 | ns | ||||
| 60 | rs3212355 | 16 | MC1R | C/T | 89,917,969 | Valenzuela et al. ( | 0.0206 (2.89E-08) | 22 | 2.00E-01 | 6.14E-02 |
| 61 | rs312262906 (N29insA) | 16 | MC1R | INDEL -/insA | 89,919,341 | Branicki et al. ( | 0.0085 (1.2E-04) | |||
| 62 | rs1805005 | 16 | MC1R | G/T | 89,919,435 | Branicki et al. ( | ns | |||
| 63 | rs1805006 | 16 | MC1R | A/C | 89,919,509 | Branicki et al. ( | 0.003 (2.2E-02) | 13 | −0.31065309 | 5.63E-03 |
| 64 | rs2228479 | 16 | MC1R | A/G | 89,919,531 | Branicki et al. ( | 0.019 (7.45E-09) | 11 | −0.10915180 | 1.70E-03 |
| 65 | rs11547464 | 16 | MC1R | A/G | 89,919,682 | Branicki et al. ( | 0.0071 (4.6E-04) | 9 | −2.96E-01 | 5.06E-04 |
| 66 | rs1805007 | 16 | MC1R | C/T | 89,919,708 | Branicki et al. ( | 0.0268 (1.28E-11) | 3 | −0.28231475 | 5.92E-12 |
| 67 | rs201326893 (Y152OCH) | 16 | MC1R | C/A | 89,919,713 | Branicki et al. ( | ns | |||
| 68 | rs1110400 | 16 | MC1R | C/T | 89,919,721 | Branicki et al. ( | 0.0037 (1.1E-02) | 18 | −0.20059956 | 1.02E-02 |
| 69 | rs1805008 | 16 | MC1R | C/T | 89,919,735 | Branicki et al. ( | 0.021 (9.2E-10) | 7 | −0.19994906 | 1.25E-07 |
| 70 | rs885479 | 16 | MC1R | A/G | 89,919,746 | Branicki et al. ( | 0.0326 (7.63E-14) | 10 | −0.16300889 | 5.42E-04 |
| 71 | rs1805009 | 16 | TUBB3 | C/G | 89,920,137 | Branicki et al. ( | ns | |||
| 72 | rs333113 | 17 | SPNS2 | C/G | 4,497,060 | 0.013 (2.41E-06) | ||||
| 73 | rs6119471 | 20 | ASIP | C/G | 34,197,405 | Hart et al. ( | 0.214 (4.76E-85) | 26 | 9.27E-02 | 9.51E-02 |
| 74 | rs2424984 | 20 | ASIP | C/T | 34,262,568 | Valenzuela et al. ( | 0.044 (2.06E-17) | |||
| 75 | rs1885120 | 20 | MYH7B | C/G | 34,989,185 | Liu et al. ( | 0.003 (0.039) | |||
| 76 | rs2378249 | 20 | PIGU | A/G | 34,630,285 | Branicki et al. ( | 0.008 (1.4E-04) | 23 | −4.76E-02 | 7.36E-02 |
| 77 | rs6059655 | 20 | RALY | A/G | 34,077,941 | Jacobs et al. ( | 0.008 (4.2E-04) | 14 | −0.11371271 | 7.23E-03 |
ns not significant
Fig. 1Illustration of the accumulative contribution of each of the selected 36 SNP predictors towards AUC prediction accuracy of 5 skin colour categories based on the full set of 1423 individual. SNP predictors were added to the prediction model one by one in the sequential order from highest to lowest prediction rank. Each colour-coded line represents one of the 5 DNA-predicted skin colour categories. Skin colour phenotyping was via skin types derived from the Fitzpatrick scale
Contribution of each of the 36 selected SNP predictors of skin colour towards binomial prediction categories in terms of the beta coefficients and its statistical significance, within the 5-category skin colour prediction model
Fig. 2Illustration of the prediction performance of the set of 36 SNPs for the 5-category (a) and the 3-category (b) skin colour prediction model using ROC curves with AUC estimates (including the cross-validated measures) using the full training set of 1423 individuals from 29 populations. Skin colour phenotyping was via skin types derived from the Fitzpatrick scale
Model performance comparison of the 10-SNP set Bayes Classifier by Maroñas et al. (2014) and the 36-SNP set prediction model from the present study using the independent “model comparison set” of 194 individuals from 17 populations not previously used for marker discovery by applying the same phenotyping method previously employed by Maroñas et al. (2014) to allow direct comparison of the two prediction approaches
| AUC | Sensitivity | Specificity | PPV | NPV | |
|---|---|---|---|---|---|
| Bayes classifier 10-SNP model Maroñas et al. ( | |||||
| White | 0.79 | 0.97 | 0.62 | 0.84 | 0.91 |
| Int | 0.63 | 0.37 | 0.88 | 0.47 | 0.83 |
| Black | 0.64 | 0.30 | 0.98 | 0.67 | 0.92 |
| 36-SNP set model current study | |||||
| White | 0.82 | 0.99 | 0.65 | 0.86 | 0.98 |
| Int | 0.62 | 0.26 | 0.98 | 0.79 | 0.82 |
| Black | 0.92 | 0.90 | 0.94 | 0.64 | 0.99 |
| 36-SNP set model current study—Fitzpatrick scale* | |||||
| Light | 0.92 | 0.99 | 0.85 | 0.95 | 0.98 |
| Dark | 0.74 | 0.50 | 0.99 | 0.86 | 0.93 |
| Dark-Black | 0.94 | 0.92 | 0.96 | 0.79 | 0.99 |
* The 36-SNP set model performance assessment using Fitzpatrick scale phenotypes as the observed phenotype
Fig. 3Proof-of-principle illustration of the power of the developed model for predicting skin colour on a global scale, regardless of bio-geographic ancestry. Probability outputs from the 5-category skin colour prediction model based on genotypes of the 36 SNP set are shown together with a skin image of the respective DNA donor. Fourteen individuals were chosen from the ‘model comparison set’ based on their parental country of birth, both in and outside the US, representing globally distributed individuals. The order of the images is 1–14 with the following parental birth countries recorded 1-US, 2-US, 3-US, 4-US, 5-Syria, 6-Columbia, 7-China, 8-Vietnam, 9-El Salvador, 10-India, 11-Mexico, 12-Nigeria, 13-Vietnam, 14-Nigeria