| Literature DB >> 32839606 |
Xihao Li1, Zilin Li1, Hufeng Zhou1, Sheila M Gaynor1, Yaowu Liu2, Han Chen3,4, Ryan Sun5, Rounak Dey1, Donna K Arnett6, Stella Aslibekyan7, Christie M Ballantyne8, Lawrence F Bielak9, John Blangero10, Eric Boerwinkle3,11, Donald W Bowden12, Jai G Broome13, Matthew P Conomos14, Adolfo Correa15, L Adrienne Cupples16,17, Joanne E Curran10, Barry I Freedman18, Xiuqing Guo19, George Hindy20, Marguerite R Irvin7, Sharon L R Kardia9, Sekar Kathiresan21,22,23, Alyna T Khan14, Charles L Kooperberg24, Cathy C Laurie14, X Shirley Liu25,26, Michael C Mahaney10, Ani W Manichaikul27, Lisa W Martin28, Rasika A Mathias29, Stephen T McGarvey30, Braxton D Mitchell31,32, May E Montasser33, Jill E Moore34, Alanna C Morrison3, Jeffrey R O'Connell31, Nicholette D Palmer12, Akhil Pampana35,36, Juan M Peralta10, Patricia A Peyser9, Bruce M Psaty37,38, Susan Redline39,40,41, Kenneth M Rice14, Stephen S Rich27, Jennifer A Smith9,42, Hemant K Tiwari43, Michael Y Tsai44, Ramachandran S Vasan17,45, Fei Fei Wang14, Daniel E Weeks46, Zhiping Weng34, James G Wilson47,48, Lisa R Yanek29, Benjamin M Neale35,49,50, Shamil R Sunyaev35,51,52, Gonçalo R Abecasis53,54, Jerome I Rotter19, Cristen J Willer55,56,57, Gina M Peloso16, Pradeep Natarajan23,35,36, Xihong Lin58,59,60.
Abstract
Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32839606 PMCID: PMC7483769 DOI: 10.1038/s41588-020-0676-4
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 1 |STAAR workflow.
a, Prepare the input data of STAAR, including genotypes, phenotypes, covariates, and (sparse) genetic relatedness matrix. b, Annotate all variants in the genome and calculate the annotation principal components for different classes of variant function. c, Define two types of variant-sets: gene-centric analysis by grouping variants into functional genomic elements for each protein-coding gene; genetic region analysis using agnostic sliding windows. d, Estimate STAAR statistics for each variant-set. e, Obtain STAAR-O P-values for all variants sets that are defined in c and report significant findings.
Figure 2 |Correlation heatmap of functional annotation scores.
The figure shows pairwise correlations between 76 individual and integrative functional annotations using variants from the pooled samples of lipid traits in the TOPMed data. The cells in the visualization are colored by Pearson’s correlation coefficient values with deeper colors indicating higher positive (red) or negative (blue) correlations. Each annotation principal component (aPC) is the first PC calculated from the set of individual functional annotations that measure similar biological function. These aPCs are then transformed into the PHRED-scaled scores for each variant across the genome (Online Methods).
Gene-centric analysis results of both unconditional analysis and analysis conditional on known common and low-frequency variants.
12,316 discovery samples, 17,822 replication samples and 30,138 pooled samples from TOPMed program were considered in the analysis. Results for the conditionally significant genes (unconditional STAAR-O ; conditional STAAR-O ) using discovery samples are presented in the table. Chr (chromosome); Category (functional category); #SNV (number of rare variants (MAF < 1%) of the particular functional category in the gene); STAAR-O (STAAR-O P-value); LDL-C (low-density lipoprotein cholesterol); HDL-C (high-density lipoprotein cholesterol); TG (triglycerides); TC (total cholesterol); Variants Adjusted (adjusted variants in conditional analysis).
| Trait | Gene | Chr | Category | Discovery | Replication | Pooled | Variants Adjusted | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #SNV | STAAR-O (Unconditional) | STAAR-O (Conditional) | #SNV | STAAR-O (Unconditional) | STAAR-O (Conditional) | #SNV | STAAR-O (Unconditional) | STAAR-O (Conditional) | |||||
| 1 | pLoF | 5 | 3.09E-38 | 1.94E-07 | 8 | 6.97E-27 | 5.29E-10 | 9 | 4.59E-65 | 7.52E-17 | rs28362286, rs28362263, rs11591147, rs12117661 | ||
| 2 | pLoF | 11 | 1.91E-14 | 2.38E-14 | 5 | 1.97E-09 | 1.76E-09 | 16 | 3.91E-21 | 4.08E-21 | rs934197 | ||
| 1 | missense | 92 | 1.09E-16 | 2.65E-08 | 129 | 1.90E-06 | 1.15E-06 | 167 | 2.11E-15 | 1.14E-14 | rs28362286, rs28362263, rs11591147, rs12117661 | ||
| 7 | missense | 174 | 1.29E-07 | 3.83E-07 | 219 | 2.19E-03 | 3.28E-03 | 293 | 3.25E-10 | 1.58E-09 | rs10234070, rs73107473, rs2072183, rs41279633, rs17725246, rs2073547, rs10260606, rs217386, rs7791240, rs2300414 | ||
| 7 | disruptive missense | 94 | 3.15E-09 | 9.27E-09 | 129 | 1.46E-04 | 2.59E-04 | 173 | 8.05E-12 | 4.02E-11 | rs10234070, rs73107473, rs2072183, rs41279633, rs17725246, rs2073547, rs10260606, rs217386, rs7791240, rs2300414 | ||
| 19 | missense | 54 | 3.11E-10 | 9.88E-11 | 58 | 6.61E-05 | 3.47E-04 | 88 | 1.07E-13 | 2.02E-12 | rs7412, rs429358 | ||
| 11 | pLoF | 5 | 2.20E-07 | 6.82E-07 | 6 | 5.73E-18 | 2.89E-17 | 7 | 3.18E-23 | 4.51E-22 | rs66505542 | ||
| 11 | pLoF | 5 | 1.10E-14 | 5.53E-14 | 6 | 2.67E-49 | 2.73E-46 | 7 | 3.98E-56 | 1.04E-52 | rs66505542, rs964184, rs7350481 | ||
| 1 | pLoF | 5 | 4.60E-33 | 2.04E-10 | 8 | 1.83E-25 | 9.74E-11 | 9 | 9.83E-58 | 4.23E-20 | rs28362286, rs11591147, rs191448952 | ||
| 2 | pLoF | 11 | 7.29E-13 | 8.78E-13 | 5 | 2.62E-09 | 2.30E-09 | 16 | 9.76E-20 | 1.01E-19 | rs934197 | ||
| 1 | missense | 92 | 6.00E-15 | 1.11E-06 | 131 | 2.14E-05 | 1.13E-05 | 169 | 5.18E-12 | 3.16E-12 | rs28362286, rs11591147, rs191448952 | ||
| 18 | missense | 62 | 9.61E-08 | 4.34E-06 | 68 | 3.45E-04 | 1.47E-01 | 101 | 2.04E-09 | 5.62E-04 | rs4939883, rs7241918, rs149615216 | ||
Burden test P-value.
Figure 3 |Genetic region (2-kb sliding window) unconditional analysis results of LDL-C in discovery phase using the TOPMed cohort.
a, Manhattan plot showing the associations of 2.66 million 2-kb sliding windows for LDL-C versus of STAAR-O. The horizontal line indicates a genome-wide P-value threshold of (n = 12,316). b, Quantile-quantile plot of 2-kb sliding window STAAR-O P-values for LDL-C (n = 12,316). c, Genetic landscape of the windows significantly associated with LDL-C that are located in the 150-kb region on chromosome 19. Four statistical tests were compared: Burden, SKAT, ACAT-V and STAAR-O. A dot indicates that the sliding window at this location is significant using the statistical test that the color of the dot represents (n = 12,316). d, Scatterplot of P-values for the 2-kb sliding windows comparing STAAR-O with Burden, SKAT and ACAT-V tests. Each dot represents a sliding window with x-axis label being the of the conventional test and y-axis label being the of STAAR-O (n = 12,316).
Genetic region (2-kb sliding window) analysis results of both unconditional analysis and analysis conditional on known common and low-frequency variants.
12,316 discovery samples, 17,822 replication samples and 30,138 pooled samples from the TOPMed program were considered in the analysis. Results for the conditionally significant sliding windows (unconditional STAAR-O ; conditional STAAR-O ) using discovery samples are presented in the table. Chr (chromosome); Start Location (start location of the 2-kb sliding window); End Location (end location of the 2-kb sliding window); #SNV (number of rare variants (MAF < 1%) in the 2-kb sliding window); STAAR-O (STAAR-O P-value); LDL-C (low-density lipoprotein cholesterol); TG (triglycerides); TC (total cholesterol); Variants Adjusted (adjusted variants in conditional analysis). Physical positions of each window are on build hg38.
| Trait | Chr | Start Location | End Location | Gene | Discovery | Replication | Pooled | Variants Adjusted | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #SNV | STAAR-O (Unconditional) | STAAR-O (Conditional) | #SNV | STAAR-O (Unconditional) | STAAR-O (Conditional) | #SNV | STAAR-O (Unconditional) | STAAR-O (Conditional) | ||||||
| 1 | 55045498 | 55047497 | 114 | 7.83E-09 | 1.06E-04 | 124 | 3.33E-06 | 4.10E-04 | 186 | 1.89E-15 | 2.90E-09 | rs28362286, rs28362263, rs11591147, rs12117661 | ||
| 1 | 55046498 | 55048497 | 124 | 5.32E-09 | 2.13E-05 | 130 | 1.79E-06 | 8.79E-05 | 191 | 1.33E-15 | 1.15E-09 | rs28362286, rs28362263, rs11591147, rs12117661 | ||
| 19 | 44881528 | 44883527 | 118 | 7.31E-10 | 1.81E-08 | 155 | 5.16E-04 | 2.42E-01 | 202 | 8.15E-08 | 5.26E-06 | rs7412, rs429358 | ||
| 19 | 44882528 | 44884527 | 104 | 2.08E-10 | 3.90E-09 | 133 | 1.23E-01 | 3.59E-01 | 176 | 1.38E-08 | 7.47E-07 | rs7412, rs429358 | ||
| 19 | 44893528 | 44895527 | 110 | 2.64E-19 | 2.33E-11 | 136 | 4.54E-09 | 2.60E-02 | 187 | 7.29E-29 | 7.62E-13 | rs7412, rs429358 | ||
| 19 | 44894528 | 44896527 | 120 | 2.44E-15 | 4.31E-11 | 153 | 7.62E-05 | 1.74E-02 | 205 | 6.73E-20 | 5.28E-13 | rs7412, rs429358 | ||
| 19 | 44905528 | 44907527 | 91 | 1.73E-10 | 1.64E-10 | 115 | 1.22E-02 | 4.91E-03 | 169 | 7.68E-12 | 9.00E-12 | rs7412, rs429358 | ||
| 19 | 44906528 | 44908527 | 84 | 1.67E-09 | 1.90E-10 | 115 | 8.65E-03 | 3.24E-03 | 165 | 8.34E-11 | 6.25E-12 | rs7412, rs429358 | ||
| 19 | 44907528 | 44909527 | 113 | 1.01E-09 | 1.97E-10 | 143 | 5.92E-03 | 3.58E-03 | 205 | 4.88E-11 | 8.71E-12 | rs7412, rs429358 | ||
| 19 | 44908528 | 44910527 | 140 | 6.30E-10 | 1.32E-10 | 152 | 4.14E-03 | 6.10E-03 | 228 | 2.40E-11 | 5.21E-12 | rs7412, rs429358 | ||
| 19 | 44931528 | 44933527 | 114 | 6.63E-09 | 7.60E-04 | 123 | 5.78E-11 | 5.40E-03 | 181 | 1.34E-19 | 4.15E-06 | rs7412, rs429358 | ||
| 11 | 116828930 | 116830929 | 125 | 4.63E-10 | 2.80E-09 | 155 | 1.35E-36 | 3.94E-34 | 207 | 7.32E-45 | 2.73E-41 | rs66505542, rs964184, rs7350481 | ||
| 11 | 116829930 | 116831929 | 109 | 3.61E-10 | 5.99E-10 | 140 | 2.85E-36 | 4.25E-34 | 187 | 5.75E-45 | 2.17E-41 | rs66505542, rs964184, rs7350481 | ||
| 1 | 55045498 | 55047497 | 114 | 3.05E-09 | 2.86E-07 | 130 | 3.12E-06 | 1.92E-06 | 189 | 2.22E-15 | 9.21E-14 | rs28362286, rs11591147, rs191448952 | ||
| 1 | 55046498 | 55048497 | 124 | 2.24E-09 | 2.06E-07 | 138 | 2.19E-06 | 1.34E-06 | 195 | 1.78E-15 | 7.04E-14 | rs28362286, rs11591147, rs191448952 | ||
| 19 | 44893528 | 44895527 | 111 | 9.35E-13 | 4.37E-07 | 146 | 1.12E-07 | 4.02E-01 | 196 | 7.57E-21 | 7.91E-08 | rs7412, rs429358 | ||
| 19 | 44894528 | 44896527 | 120 | 1.80E-09 | 1.99E-06 | 164 | 1.08E-04 | 8.31E-01 | 213 | 8.40E-14 | 2.19E-07 | rs7412, rs429358 | ||