| Literature DB >> 34686674 |
N Hernández1, J Soenksen2,3, P Newcombe1, M Sandhu4, I Barroso2, C Wallace1,5, J L Asimit6.
Abstract
Joint fine-mapping that leverages information between quantitative traits could improve accuracy and resolution over single-trait fine-mapping. Using summary statistics, flashfm (flexible and shared information fine-mapping) fine-maps signals for multiple traits, allowing for missing trait measurements and use of related individuals. In a Bayesian framework, prior model probabilities are formulated to favour model combinations that share causal variants to capitalise on information between traits. Simulation studies demonstrate that both approaches produce broadly equivalent results when traits have no shared causal variants. When traits share at least one causal variant, flashfm reduces the number of potential causal variants by 30% compared with single-trait fine-mapping. In a Ugandan cohort with 33 cardiometabolic traits, flashfm gave a 20% reduction in the total number of potential causal variants from single-trait fine-mapping. Here we show flashfm is computationally efficient and can easily be deployed across publicly available summary statistics for signals in up to six traits.Entities:
Mesh:
Year: 2021 PMID: 34686674 PMCID: PMC8536717 DOI: 10.1038/s41467-021-26364-y
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Schematic diagram for flashfm.
Flashfm is used for multiple quantitative traits that are measured in the same studies, allowing for missing measurements and family data. First, standard analysis of single-trait fine-mapping is needed for each trait. Then the model posterior probabilities (PPs) from each of these marginal fine-mapping analyses are combined in flashfm, using an approximation to the joint PP, based on an approximation of the joint Bayes’ factor. In addition to a SNP correlation matrix, a trait covariance approximation is also needed. Information is shared between traits via a sharing prior that upweights joint models with shared causal variants by a factor of Ҝ. Memory requirements are reduced by storing only the trait-adjusted marginal PPs for each trait.
Fig. 2Comparison of fine-mapping from flashfm and single-trait analyses.
When traits share a causal variant, flashfm has higher accuracy than single-trait finemapping, regardless of amount of missing data and trait correlation; both methods have similar accuracy when there are no shared causal variants. Causal variants were simulated for two traits with models defined by SNP groups from the IL2RA region. We vary sample size when the traits share a causal variant (a) and do not share any causal variants (b). At fixed sample size N = 3000, we vary the proportion of missing data for one trait (c) and vary the trait correlation (d). In a, c and d Trait 1 has causal variants A+C, while trait 2 has A+D causal variants, both A causal variants with the same effect size: β = log(1.4) and β = β = log(1.25). In a and b there are no missing data and the sample size varies from 1000 to 5000. In c the sample size is fixed at 3000 and the proportion of missing data for trait A+D varies from 0 to 0.5. In d the sample size is fixed as 3000 and the correlation between traits varies. In b Trait 1 has causal variants A+D with β = log(1.25) and β = (1.25), while trait 2 has a single causal variant C with β = log(1.25). Results are based on 300 replications. Source data are provided in Supplementary Data 1, Supplementary Data 1.2, 1.3, 1.6, 1.7.
Resolution comparison between single-trait fine-mapping and flashfm.
| Trait 1 (A+D); Trait 2 (A+C); Vary sample size | Trait 1 (A+D; Trait 2 (A+C); Vary proportion missing trait 1data | ||||
|---|---|---|---|---|---|
| Median percentage size reduction | Single-trait group A coverage | Multi-trait group A coverage | Trait 1 proportion missing, | Median percentage size reduction | |
| 1000 | 0 | 1 | 0.986 | 0 | 28.5 |
| 2000 | 10.5 | 0.997 | 0.99 | 0.1 | 31.4 |
| 3000 | 28.5 | 1 | 1 | 0.2 | 27.8 |
| 4000 | 32.5 | 1 | 0.997 | 0.3 | 18.8 |
| 5000 | 33.3 | 1 | 1 | 0.4 | 16.7 |
| 0.5 | 10.5 | ||||
When traits share a causal variant, flashfm tends to yield smaller SNP groups than those from single-trait fine-mapping, regardless of amount of missing data and trait correlation; both methods have similar resolution and accuracy when there are no shared causal variants. In simulations with a shared causal variant A, (trait 1 is A+D, trait 2 is A+C), βA = log(1.4) for both traits 1 and 2; trait 1 has a second causal variant D and trait 2 has second causal variant C, both with β = log(1.25). In the non-shared causal variant setting (A+D, C), all causal variants have β = log(1.25). Traits 1 and 2 have correlation 0.4 and were both measured on all individuals, unless otherwise specified. When proportion missing data and trait correlation vary, sample size is 3000. The region has 345 SNPS and was simulated to mimic the LD structure of the IL2RA region, 10p-6030000-6220000 (GRCh37/hg19). Results are based on 300 replications.
Median flashfm running time (with second and third quartiles), in seconds.
| Number of Traits | 250-SNP Region (67 kb) | 500-SNP Region (144 kb) | 1000-SNP Region (312 kb) |
|---|---|---|---|
| 2 | 2 (1, 7) | 5 (2, 15) | 5 (1, 16) |
| 3 | 13 (5, 33) | 15 (8, 40) | 16 (5, 59) |
| 4 | 435 (49, 2173) | 583 (116, 1790) | 168 (32, 740) |
Flashfm was run using cpp = 0.99 and single-trait fine-mapping results from JAM, using the extended version (JAMexpandedCor.multi) in the flashfm package. Median time was measured over 100 replications in simulations of 2, 3, and 4 traits having correlation 0.4 and sample size 5000. The regions were subsets of the CTLA4 region 2q-204446258-204816382 (GRCh37/hg19).
Probabilities describing the relationship between flashfm ranks of causal variants when the trait correlation is mis-specified.
| Trait 1 (E+G) | ||||
|---|---|---|---|---|
| Pr(matched ranks) | Pr(matched or improved ranks) | |||
| Trait correlation shift | rs1980422/E | rs3087243/G | rs1980422/E | rs3087243/G |
| −0.2 | 0.870 | 0.923 | 0.960 | 0.950 |
| −0.1 | 0.903 | 0.950 | 0.970 | 0.967 |
| 0.1 | 0.897 | 0.950 | 0.933 | 0.983 |
| 0.2 | 0.793 | 0.900 | 0.857 | 0.947 |
Two traits were simulated to have causal variants E+G and E+H and trait correlation 0.4; sample size is N = 3000. Comparisons are made between flashfm results using the estimated trait correlation as input and flashfm results with this trait correlation estimate shifted upwards/downwards by 0.1 or 0.2. The region has 1231 SNPS and was simulated to mimic the LD structure of the CTLA4 region, 2q-204446258-204816382 (GRCh37/hg19). Results are based on 300 replications.
Comparison of probabilities that causal variants have rank 10 or less, from flashfm and fastPAINTOR, varying sample size.
| Trait 1 (A+D) | Trait 3 (I) | |||||
|---|---|---|---|---|---|---|
| Pr(1 or more cvs rank <= 10) | Pr(Both cvs rank <= 10) | Pr(rank cv <= 10) | ||||
| flashfm | fastPAINTOR | flashfm | fastPAINTOR | flashfm | fastPAINTOR | |
| 1000 | 0.787 | 0.393 | 0.130 | 0.057 | 0.943 | 0.557 |
| 2000 | 0.890 | 0.733 | 0.563 | 0.243 | 0.983 | 0.750 |
| 3000 | 0.910 | 0.873 | 0.750 | 0.413 | 0.997 | 0.850 |
| 4000 | 0.937 | 0.917 | 0.837 | 0.560 | 0.997 | 0.903 |
| 5000 | 0.957 | 0.940 | 0.897 | 0.670 | 1 | 0.943 |
Flashfm tends to have higher probabilities than those from fastPAINTOR, especially for detecting all (multiple) causal variants of a trait. Three traits were simulated to have causal variants A+D, A+C+E, and I and trait correlation is 0.4. Sample size ranges from N = 1000 to 5000. The region has 345 SNPS and was simulated to mimic the LD structure of the IL2RA region, 10p-6030000-6220000 (GRCh37/hg19). Results are based on 300 replications.
Comparison of probabilities that causal variants (cvs) have rank 10 or less, from flashfm and fastPAINTOR, varying sample size.
| Trait 1 (E+G) | ||||
|---|---|---|---|---|
| Pr(1 or more cvs rank <= 10) | Pr(Both cvs rank <= 10) | |||
| flashfm | fastPAINTOR | flashfm | fastPAINTOR | |
| 1000 | 0.797 | 0.493 | 0.243 | 0.083 |
| 2000 | 0.937 | 0.757 | 0.530 | 0.187 |
| 3000 | 0.987 | 0.823 | 0.677 | 0.303 |
| 4000 | 1.000 | 0.870 | 0.847 | 0.403 |
| 5000 | 0.997 | 0.887 | 0.867 | 0.457 |
For all sample sizes, flashfm consistently has larger probabilities than those from fastPAINTOR. Flashfm has twice the probability of fastPAINTOR for both causal variants to have rank 10 or lower. Two traits were simulated to have causal variants E+G and E+H and trait correlation is 0.4. Sample size ranges from N = 1000 to 5000. The region has 1231 SNPS and was simulated to mimic the LD structure of the CTLA4 region, 2q-204446258-204816382 (GRCh37/hg19). Results are based on 300 replications.
Regions with top models chosen by stepwise (SW), independent fine-mapping and Flashfm where there is a noticeable reduction in SNP group sizes and/or PP of top model.
| Region | Trait | Stepwise Model | Independent | Flashfm | Change by Flashfm | |||
|---|---|---|---|---|---|---|---|---|
| Model (Group Size) | PP | Model (Group Size) | PP | PP gain | Group reduction | |||
| 1:55517883-55674945 (PCSK9,USP24) | LDL | rs11804420/A | A:rs45613943 (4) | 0.5 | A: rs45613943 (3) | 0.63 | 0.13 | A = 25% |
| TC | rs11804420/A | A:rs45613943 (4) | 0.6 | A:rs45613943 (3) | 0.76 | 0.12 | ||
| 2:62716187-62887884 (TMEM17) | ALP | rs13403582/B | B:rs7580494 (8) | 0.66 | B:rs6750204 (5) | 0.73 | 0.07 | J = 0 B = 38% |
| PLT | rs765799086/J | J:rs765799086 (1) | 0.46 | B + J J:rs765799086 (1) | 0.62 | 0.16 | ||
| 15:58718136-58742605 (LIPC) | HDL | rs1800588/G | G:rs8033940 (5) | 0.42 | G:rs1800588 (4) | 0.52 | 0.10 | A = 20% |
| TG | rs1077835/G | G:rs8033940 (5) | 0.56 | G:rs1800588 (4) | 0.66 | 0.10 | ||
| 16:441156-557188 (LOC100134368, NME4, DECR2, RAB11FIP3) | MCV | rs75167983/L + rs150717215/C + rs147633052/A | L + C + A + B L:rs144739959 (4) C:rs150717215 (2) A:rs147633052 (14) B:rs116567883 (1) | 0.42 | L + C + A + B L:rs75167983 (1) C:rs150717215 (2) A = rs147633052 (14) B = rs116567883 (1) | 0.46 | 0.04 | L = 75% C = 0 A = 0 B = 0 D = 0 |
| MCH | rs75167983/L + rs150717215/C + rs147633052/A + rs116567883/B. | L + C + A + B | 0.15 | L + C + A + B | 0.26 | 0.11 | ||
L + C + A + B + D D:rs553374841 (2) | 0.15 | L + C + A + B + D D = rs553374841 (2) | 0.16 | 0.01 | ||||
| Bilirubin | L = rs75167983 | L = rs144739959 (4) | 0.45 | L = rs75167983 (1) | 0.63 | 0.18 | ||
| 19:45380937-45441453 (NECTIN2, TOMM40, APOE, APOC1, APOC1P1) | LDL | rs7412/B + rs34215622/V + rs61357706/E + rs429358/L + rs367640607/D2 | E + L + B + V + D2 E:rs113152469 (5) L:rs429358 (1) B:rs61679753 (2) V:rs34215622 (1) D2:rs367640607 (2) | 0.41 | E + L + B + V + D2 E:rs113152469 (5) L:rs429358 (1) B:rs7412 (1) V:rs34215622 (1) D2:rs367640607 (2) | 0.40 | 0.00 | B = 50% E2 = 25% E = 0 L = 0 V = 0 D2 = 0 |
| TC | rs7412/B + rs34215622/V + rs429358/L | E + L + B + V | 0.26 | E + L + B + V | 0.41 | 0.15 | ||
| E + L + B + V + D2 | 0.25 | E + L + B + V + D2 | 0.31 | 0.06 | ||||
| TG | rs12721054/I + rs5112/X | E + I + Y + E2 I = rs12721054 (1) Y = rs7260330 (3) E2 = rs12721051 (4) | 0.42 | E + I + X + E2 I = rs12721054 (1) X = rs5112 (1) E2 = rs12721051 (3) | 0.63 | 0.21 | ||
| HDL | rs75627662/A | A = rs75627662 (1) | 0.38 | B = rs7412 (1) | 0.42 | 0.03 | ||
Each row summarises results for a single region, defined by chromosome, start and end base-pair position and nearby gene(s). Each cell lists the SNP groups in a model; model A+B indicates all 2-SNP models with one SNP from group A and one SNP from group B. The number of SNPs in each group is given in brackets beside each group in the model. Here we list a representative SNP from each group; rs IDs are from build GRCh37/hg19. The SNPs belonging to each group and their functional annotations are given in Supplementary Data 1.22.
Fig. 3Fine-mapping of signals for four lipid traits in region 19:45380937-45441453.
The -log10p for SNPs in the top SNP groups for a LDL; b total cholesterol (TC); c triglycerides (TG); d HDL are shown for both FINEMAP and flashfm. The two methods agree on a 5-SNP model for LDL (a) and a 4-SNP model for TC (b). The top model for TG (c) has 4 SNPs under both methods but differ in one SNP group; FINEMAP prefers 3-SNP group Y (very near one another so appear as one) and flashfm selected single SNP group X (mean r2 of SNPs in Y with X is 0.315). For HDL (d), a different single-SNP model was selected by the two methods; FINEMAP favoured group A, whereas flashfm selected group B. The solid coloured circles show SNPs that belong to the SNP groups constructed by both methods, the empty coloured circles represent SNPs that are only in the FINEMAP SNP group; solid grey circles show all other SNPs in the region. In c and d an X represents a SNP that appeared in a top model for flashfm and not FINEMAP and empty circles indicate SNPs that appeared in top models for FINEMAP and not flashfm. Position is given according to hg19/build 37. Some of the genes in this region include APOE, APOC1 and TOMM40.