| Literature DB >> 35504290 |
Le Huang1, Jonathan D Rosen2, Quan Sun2, Jiawen Chen2, Marsha M Wheeler3, Ying Zhou4, Yuan-I Min5, Charles Kooperberg4, Matthew P Conomos6, Adrienne M Stilp6, Stephen S Rich7, Jerome I Rotter8, Ani Manichaikul7, Ruth J F Loos9, Eimear E Kenny10, Thomas W Blackwell11, Albert V Smith11, Goo Jun12, Fritz J Sedlazeck13, Ginger Metcalf13, Eric Boerwinkle14, Laura M Raffield15, Alex P Reiner16, Paul L Auer17, Yun Li18.
Abstract
Current publicly available tools that allow rapid exploration of linkage disequilibrium (LD) between markers (e.g., HaploReg and LDlink) are based on whole-genome sequence (WGS) data from 2,504 individuals in the 1000 Genomes Project. Here, we present TOP-LD, an online tool to explore LD inferred with high-coverage (∼30×) WGS data from 15,578 individuals in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. TOP-LD provides a significant upgrade compared to current LD tools, as the TOPMed WGS data provide a more comprehensive representation of genetic variation than the 1000 Genomes data, particularly for rare variants and in the specific populations that we analyzed. For example, TOP-LD encompasses LD information for 150.3, 62.2, and 36.7 million variants for European, African, and East Asian ancestral samples, respectively, offering 2.6- to 9.1-fold increase in variant coverage compared to HaploReg 4.0 or LDlink. In addition, TOP-LD includes tens of thousands of structural variants (SVs). We demonstrate the value of TOP-LD in fine-mapping at the GGT1 locus associated with gamma glutamyltransferase in the African ancestry participants in UK Biobank. Beyond fine-mapping, TOP-LD can facilitate a wide range of applications that are based on summary statistics and estimates of LD. TOP-LD is freely available online.Entities:
Mesh:
Year: 2022 PMID: 35504290 PMCID: PMC9247832 DOI: 10.1016/j.ajhg.2022.04.006
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.043
Figure 1Number of variants included in TOP-LD
(A) Comparison of autosomal variants with HaploReg 4.0 by population. Blue bars on the left show total number of autosomal variants in HaploReg4.0. Green and red indicate common (MAF ≥ 1%) and uncommon (MAF < 1%) autosomal variants in TOP-LD. Note that HaploReg4.0 provides LD for ASN (Asian) with no separate information for EAS and SAS. Therefore, we used the same 13.7 million ASN variants for comparison in both EAS and SAS.
(B) Number of autosomal variants in TOP-LD breaking down by LD R2 threshold. The majority of the variants have at least one LD proxy with R2 ≥ 0.8.
(C) Number of chrX variants in TOP-LD breaking down by LD R2 threshold.
(Note: LD information downloaded from HaploReg4.0 does not contain chromosome X. Therefore, we compared TOP-LD with HaploReg4.0 only for autosomal variants).
Summary of SVs by population
| Population | Number of SVs | Number of SVs in LD w/SNVs | Number of SVs with MAF < 0.01 |
|---|---|---|---|
| EUR | 79,004 | 16,301 | 69,011 |
| AFR | 44,859 | 15,151 | 27,978 |
| SAS | 16,511 | 10,392 | 7,292 |
| EAS | 20,789 | 7,498 | 12,902 |
Number of SVs having at least one SNV LD tag with R2 ≥ 0.8.
Figure 2Elapsed time (in seconds) for queries
The x axis represents the number of variants queried, and the y axis represents the elapsed time.
Figure 3An example query result
The result contains two parts. The top part “LD information from AFR” shows the LD information where each line provides information between a query variant (rsID1) and one of its corresponding LD proxies (rsID2). The bottom part “variant information from AFR” provides variant information, which shows basic information for each query variant. From the bottom part, we know that the user’s query includes four variants: rs334, rs8008208820, rs2462498, and rs12219304. Variants not included in LD calculation will have “none” records. For instance, rs8008208820 in this example query is not involved in LD inference and therefore will not have any LD proxies in the top part simply because of no data. Records from SV inference are in blue and those from SNV data are in orange. Some variants may appear twice because they are included in both SNV LD calculation and SV calculation. For example, in this example, rs12219304 appeared twice with MAF 0.0558 from the SNV source (second last record in orange) and MAF 0.0543 from the SV source (last record in blue).
Summary statistics of distinct working truth at GGT1 locus associated with gamma glutamyltransferase
| Signal | Variant | Position (hg38) | Effect allele | Unconditional p value | p value conditional on previous signals | Effect allele frequency |
|---|---|---|---|---|---|---|
| 1 | rs4049904 | 24609759 | G | 2.82e−61 | N/A | 10.27% |
| 2 | rs73404962 | 24598530 | G | 4.46e−29 | 2.00e−36 | 5.63% |
| 3 | rs743369 | 24588099 | A | 9.94e−36 | 7.51e−27 | 11.94% |
| 4 | rs6004193 | 24598329 | C | 4.23e−41 | 3.25e−19 | 18.27% |
| 5 | rs57719575 | 24609020 | C | 3.97e−38 | 1.98e−24 | 14.86% |
| 6 | rs3876101 | 24607291 | A | 2.66e−15 | 1.17e−13 | 35.45% |
| 7 | rs116161010 | 24585912 | T | 5.69e−17 | 7.70e−9 | 7.13% |
The p values are reported from the sequential conditional analysis. For example, we report the p value for rs73404962 conditional on rs4049904, the p value of rs743369 conditional on both rs4049904 and rs73404962, and so forth.
FINEMAP credible-set variants
| Variant 1 | Variant 2 | Variant 3 | Variant 4 | Variant 5 | ||
|---|---|---|---|---|---|---|
| 1000G reference | credible-set variant | rs4049904 | rs147866692 | rs570263050 | rs115231893 | 22:24649848:G:A (hg38) |
| LD with working truth | 1 (w/rs4049904 itself) | 0.464 (w/rs4049904) | 0.606 (w/rs4049904) | 0.275 (w/rs4049904) | 0.434 (w/rs4049904) | |
| TOP-LD reference | credible-set variant | rs4049904 | rs743369 | rs57719575 | rs2073397 | rs5751902 |
| LD with working truth | 1 (w/rs4049904 itself) | 1 (w/rs743369 itself) | 1 (w/rs57719575 itself) | 0.83 (w/rs6004193) | 0.51 (w/rs6004193) |
The two five-variant credible sets provided by FINEMAP with either 1000G or TOP-LD as reference. For each credible-set variant, we list the corresponding variant (and the LD Rsq) from the working truth that has the highest LD.