| Literature DB >> 26866982 |
Stefanie Friedrichs1, Dörthe Malzahn2, Elizabeth W Pugh3, Marcio Almeida4, Xiao Qing Liu5,6, Julia N Bailey7,8.
Abstract
High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26866982 PMCID: PMC4895695 DOI: 10.1186/s12863-015-0313-x
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Statistical tests and analyzed data
| Marker data | Data set | Statistical tests | Covariates | Trait(s) | |
|---|---|---|---|---|---|
|
| |||||
| Sequence | Family study | Single-variant regression in SOLAR | Smoking, BP medication, PC1-3, sex, age, age2, sex*age, sex*age2 | Real SBP and DBP at first time point, own simulated trait for H0 | |
|
| |||||
| Chr3: GWASmp and sequence | Unrelated individuals (from family study) | Regress pairwise DBP residual difference and sum on IBD sharing status; sequence data analyses by SKAT-O | Sex, age, smoking, PC 1-3 | Real DBP at first time point | |
|
| |||||
| Sequence | Family study | Informative SNV weights in burden test T5 and SKAT; with R: seqMeta | Age, sex, smoking, BP medication | Real SBP at earliest available measurement | |
|
| |||||
| Exome sequence | Unrelated individuals (large Hispanic sample) | LRT, C- | None | Simulated HT status; real SBP, DBP with cutoffs for case-control status | |
|
| |||||
| Sequence and GWASmp | Family study | SKAT with R (coxme, kinship2, QuadCompForm); strategies for joint testing of rare and common SNVs | Sex, age, sex*age; subjects not on BP medication | Real and simulated SBP at first time point | |
|
| |||||
| Sequence and GWASmp | Family study, including gene expression data | Seq-aSum-VS burden test; regression on gene expression data; gene set enrichment analysis | PC1-3 | Average real SBP and DBP | |
BP blood pressure, Chr Chromosome, CMC Combined multivariate collapsing, DBP diastolic blood pressure, GWASmp genome-wide association study marker panel, HT hypertension, IBD identity-by-descent, LRT likelihood ratio test, PC principal component, SBP systolic blood pressure, SKAT sequence kernel association test, SNV single nucleotide variant, Seq-aSum-VS sequential sum
Filters, priors, and findings
| Filter |
| Conclusions | Annotation | |
|---|---|---|---|---|
|
| ||||
| Functional annotation, LD-corrected effective number of tests | None | LD-correction in WGS reduces multiple-testing burden by 85 %, significant associations: | Location: ANNOVAR; functional annotation: PolyPhen, SIFT | |
|
| ||||
| IBD sharing | None | No significances, | IBD mapping: BEAGLE; functional annotation: CADD | |
|
| ||||
| Sliding window on MAF ≤5 % SNVs |
| Significant association: | Functional annotation: ENCODE, RegulomeDB, PolyPhen2 | |
|
| ||||
| Genes, exome-sequence |
| Top-ranked genes differ between weighted burden tests LRT, C-α, CMC; but good overlap with literature | ANNOVAR, variant tools; random forest classifiers assign SNVs to protein binding sites; DSSP, PSAIA, DOMINO | |
|
| ||||
| Gene covering LD-blocks |
| SKAT: power depends on SNV weights, exploiting LD is very beneficial, optimal strategy for joint testing rare and common SNVs depends on LD structure | Haploview with HapMap data for LD-calculation | |
|
| ||||
|
| ||||
| Rare SNVs in genes with >1 and <50 rare SNVs (MAF < 0.01) |
| Power of burden tests improved by incorporating phenotype associated gene expression into | Genes: hg19; GO biological process categories | |
CADD combined annotation dependent depletion, DBP diastolic blood pressure, DOMINO database of domain–peptide interactions, DSSP define secondary structure of proteins, ENCODE encyclopedia of DNA elements, GO gene ontology, IBD identity-by-descent, LD linkage disequilibrium, MAF minor allele frequency, PSAIA protein structure and interaction analyzer, SBP systolic blood pressure, SIFT sorting intolerant from tolerant, SKAT sequence kernel association test, SNV single nucleotide variant, WGS whole genome sequence
Fig. 1Informed p value weighting for genes based on conditionally independent associations between rare variant burden, gene expression, and trait. The p value weight was defined as the product of the association strengths of rare SNV burden with gene expression and gene expression with trait value
Fig. 2SNV effect sizes on GAW19 simulated DBP increase with increasing PolyPhen2 scores. Depicted are 6 genes with a range of SNV effect sizes that could be simultaneously displayed. Symbols depict SNVs in the same gene: LEPR (▲), TNN (♣), HIF3A (●), MAP4(♥), MUC13(✷), CGN(■)
Fig. 3Comparison between the PolyPhen2, SIFT, and RegulomeDB functional prediction scores. Left column: Correlation of PolyPhen2 functional prediction scores with (a) SIFT or (c) RegulomeDB scores. Functional scores were transformed to have the same directionality. Nonsynonymous coding SNVs that alter the protein function should receive a PolyPhen2 score of 1 and a SIFT score of 0. Scores are metric and can be categorized as displayed. RegulomeDB annotates regulatory SNVs by an ordinal score ranging from the highest evidence (eQTL, expression quantitative trait locus) to the lowest. Right column: Filters or priors based on (b) SIFT or (d) RegulomeDB functional scores are partially mismatched on GAW19 simulated DBP. Symbols depict SNVs in the same gene: LEPR (▲), TNN (♣), HIF3A (●), MAP4(♥), MUC13(✷), CGN(■)