| Literature DB >> 26727659 |
Iuliana Ionita-Laza1, Kenneth McCallum1, Bin Xu2, Joseph D Buxbaum3,4,5,6,7.
Abstract
Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequences. Such annotations can have a critical role in identifying putatively causal variants for a disease or trait among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers and their diversity. Here we develop an unsupervised approach to integrate these different annotations into one measure of functional importance (Eigen) that, unlike most existing methods, is not based on any labeled training data. We show that the resulting meta-score has better discriminatory ability using disease-associated and putatively benign variants from published studies (in both coding and noncoding regions) than the recently proposed CADD score. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.Entities:
Mesh:
Year: 2016 PMID: 26727659 PMCID: PMC4731313 DOI: 10.1038/ng.3477
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 1Correlation among different functional annotations for the noncoding variants on chromosome 1 in the training dataset. Supplementary Figure 1 contains the correlation plot for non-synonymous coding variants.
P values (Wilcoxon rank-sum test) for MLL2, CFTR, BRCA1, BRCA2, contrasting pathogenic variants with benign variants in the ClinVar database. The best performing individual annotation is also reported (for missense variants only).
| Gene | n | Variant type | Score | P value |
|---|---|---|---|---|
| 1.6E-50 | ||||
| CADD-score v1.0 | 1.2E-42 | |||
| CADD-score v1.1 | 1.3E-49 | |||
| 31 | Missense | 3.1E-13 | ||
| 5.1E-13 | ||||
| CADD-score v1.0 | 2.8E-02 | |||
| CADD-score v1.1 | 2.8E-06 | |||
| SIFT | 6.8E-15 | |||
|
| ||||
| 160 | Missense and Nonsense | 1.3E-69 | ||
| 8.2E-65 | ||||
| CADD-score v1.0 | 1.1E-65 | |||
| CADD-score v1.1 | 3.1E-39 | |||
| 92 | Missense | 2.8E-37 | ||
| 9.6E-37 | ||||
| CADD-score v1.0 | 7.9E-35 | |||
| CADD-score v1.1 | 1.7E-21 | |||
| PolyPhenVar | 4.8E-36 | |||
|
| ||||
| 125 | Missense and Nonsense | 2.5E-38 | ||
| 6.0E-25 | ||||
| CADD-score v1.0 | 2.2E-28 | |||
| CADD-score v1.1 | 1.3E-22 | |||
| 28 | Missense | 4.0E-03 | ||
| 1.6E-02 | ||||
| CADD-score v1.0 | 5.0E-03 | |||
| CADD-score v1.1 | 1.4E-03 | |||
| SIFT | 1.0E-05 | |||
|
| ||||
| 110 | Missense and Nonsense | 9.8E-28 | ||
| 3.3E-14 | ||||
| CADD-score v1.0 | 1.5E-46 | |||
| CADD-score v1.1 | 7.7E-40 | |||
| 13 | Missense | 2.3E-01 | ||
| 3.5E-01 | ||||
| CADD-score v1.0 | 3.6E-01 | |||
| CADD-score v1.1 | 1.8E-02 | |||
| MA | 9.5E-03 | |||
Figure 2Violin plots for Eigen scores for de novo mutations in ID, EPI, ASD-FMRP, ASD, SCZ and CTRL. The horizontal line corresponds to the median Eigen score for de novo CTRL mutations (the lowest scoring set).
P values (Wilcoxon rank-sum test) for de novo mutations in ASD, EPI, ID, and SCZ studies. ASD-FMRP analyses are based on de novo mutations in ASD cases that hit FMRP targets. The best performing individual annotation is also reported (for missense variants only).
| Disease | n | Variant type | Score | P value |
|---|---|---|---|---|
| ASD | 2,027 | Missense and Nonsense | 6.0E-03 | |
| 1.6E-02 | ||||
| CADD-score v1.0 | 8.4E-02 | |||
| CADD-score v1.1 | 3.2E-01 | |||
| 1,753 | Missense only | 9.0E-02 | ||
| 1.5E-01 | ||||
| CADD-score v1.0 | 7.4E-01 | |||
| CADD-score v1.1 | 5.8E-01 | |||
| PolyPhenDiv | 5.4E-02 | |||
|
| ||||
| ASD-FMRP | 132 | Missense and Nonsense | 4.2E-05 | |
| 9.4E-06 | ||||
| CADD-score v1.0 | 5.5E-03 | |||
| CADD-score v1.1 | 4.7E-03 | |||
| 113 | Missense only | 3.2E-04 | ||
| 9.4E-05 | ||||
| CADD-score v1.0 | 4.2E-02 | |||
| CADD-score v1.1 | 1.7E-02 | |||
| MA | 1.0E-04 | |||
|
| ||||
| EPI | 210 | Missense and Nonsense | 3.1E-03 | |
| 5.0E-03 | ||||
| CADD-score v1.0 | 4.0E-02 | |||
| CADD-score v1.1 | 2.0E-01 | |||
| 184 | Missense only | 6.0E-03 | ||
| 1.3E-02 | ||||
| CADD-score v1.0 | 8.1E-02 | |||
| CADD-score v1.1 | 1.7E-01 | |||
| PolyPhenVar | 3.0E-03 | |||
|
| ||||
| ID | 114 | Missense and Nonsense | 1.7E-06 | |
| 1.1E-06 | ||||
| CADD-score v1.0 | 3.7E-06 | |||
| CADD-score v1.1 | 9.5E-03 | |||
| 99 | Missense only | 6.7E-05 | ||
| 6.0E-05 | ||||
| CADD-score v1.0 | 3.5E-05 | |||
| CADD-score v1.1 | 3.3E-02 | |||
| MA | 1.0E-04 | |||
|
| ||||
| SCZ | 636 | Missense and Nonsense | 9.9E-01 | |
| 9.8E-01 | ||||
| CADD-score v1.0 | 1.5E-01 | |||
| CADD-score v1.1 | 1.8E-01 | |||
| 573 | Missense only | 6.3E-01 | ||
| 5.8E-01 | ||||
| CADD-score v1.0 | 9.8E-01 | |||
| CADD-score v1.1 | 2.8E-02 | |||
| PhastPri | 9.5E-02 | |||
P values (Wilcoxon rank-sum test) for GWAS SNPs and eQTLs. Comparisons are shown between GWAS index SNPs and tag SNPs hitting regulatory elements. Also shown are comparisons between GWAS index SNPs and control SNPs matched for frequency, functional consequence, and GWAS array availability. Additionally, comparisons between eQTLs and tag SNPs hitting regulatory elements are shown. The best performing individual annotation is also reported.
| Dataset | n | Comparison | Score | P value |
|---|---|---|---|---|
| GWAS | 2,115 | Regulatory GWAS vs. Tag SNPs | 1.2E-05 | |
| 4.0E-06 | ||||
| CADD-score v1.0 | 5.9E-04 | |||
| CADD-score v1.1 | 2.0E-04 | |||
| GWAVA (TSS) | 4.1E-06 | |||
| TFBS num | 4.9E-05 | |||
|
| ||||
| GWAS | 2,115 | Regulatory GWAS vs. Other SNPs | 1.6E-09 | |
| 2.0E-13 | ||||
| CADD-score v1.0 | 2.0E-06 | |||
| CADD-score v1.1 | 8.6E-07 | |||
| GWAVA (TSS) | 7.4E-13 | |||
| TFBS sum | 5.6E-09 | |||
|
| ||||
| GWAS | 10,718 | GWAS vs. Matched Controls | 6.9E-08 | |
| 3.5E-13 | ||||
| CADD-score v1.0 | 1.0E-04 | |||
| CADD-score v1.1 | 5.2E-07 | |||
| GWAVA (TSS) | 2.5E-09 | |||
| H3K4Me1 | 4.0E-11 | |||
|
| ||||
| eQTLs | 676 | Regulatory eQTLs vs. Tag SNPs | 1.8E-10 | |
| 7.0E-23 | ||||
| CADD-score v1.0 | 3.1E-04 | |||
| CADD-score v1.1 | 4.3E-05 | |||
| GWAVA (TSS) | 1.3E-03 | |||
| H3K4Me3 | 2.2E-24 | |||
|
| ||||
| eQTLs | 676 | Regulatory eQTLs vs. Other SNPs | 5.9E-13 | |
| 2.6E-27 | ||||
| CADD-score v1.0 | 2.8E-04 | |||
| CADD-score v1.1 | 2.1E-05 | |||
| GWAVA (TSS) | 7.3E-08 | |||
| H3K4Me3 | 3.8E-25 | |||
P values (Wilcoxon rank-sum test) for somatic mutations (recurrent vs. non-recurrent) in the COSMIC database. Comparisons are done for variants in different functional categories. n-rec is the number of recurrent somatic mutations, and n-nonrec is the number of nonrecurrent somatic mutations. The best performing individual functional annotation is also reported.
| Variant Class | n-rec | n-nonrec | CADD-score v1.0 | CADD-score v1.1 | Best Individual Annotation | ||
|---|---|---|---|---|---|---|---|
| Regulatory | 21,279 | 428,398 | 2.02E-165 | 5.13E-264 | 1.05E-71 | 2.70E-50 | ≤2.22E-308 (PolIIpval) |
| Intronic | 85,502 | 2,093,158 | 2.40E-155 | 2.13E-112 | 2.89E-61 | 1.09E-10 | ≤2.22E-308 (GERP NR) |
| Downstream | 15,956 | 318,967 | 2.73E-92 | 3.04E-128 | 4.31E-36 | 1.83E-28 | 1.01E-155 (GERP NR) |
| Upstream | 14,636 | 309,615 | 1.28E-52 | 2.01E-84 | 7.90E-24 | 3.21E-17 | 9.68E-86 (PolIIPval) |
| Noncoding Change | 4,903 | 66,717 | 2.51E-07 | 2.49E-21 | 1.51E-01 | 4.84E-05 | 8.13E-35 (PolIIPval) |
| 3′UTR | 2,236 | 28,261 | 6.94E-03 | 4.22E-04 | 1.06E-05 | 3.37E-01 | 5.67E-05 (GERP NR) |
| 5′UTR | 417 | 3,908 | 1.14E-02 | 2.32E-01 | 6.43E-02 | 1.15E-01 | 2.79E-07 (GERP NR) |
| Intergenic | 75,327 | 2,182,466 | 1.49E-02 | 3.97E-06 | 1.08E-06 | 6.30E-16 | 1.19E-18 (H3K4Me1) |
| Synonymous | 434 | 2,388 | 1.09E-01 | 9.69E-01 | 8.25E-01 | 2.88E-01 | 2.16E-03 (PhyloPri) |
Figure 3Violin plots for Eigen scores for noncoding variants in the COSMIC database that reside in different functional categories. The horizontal line corresponds to the median Eigen score for intergenic variants (the lowest scoring class).